Sql-server – Configure Lucene.Net with SQL Server

lucene.netsql server

Has anyone used Lucene.NET rather than using the full text search that comes with sql server?

If so I would be interested on how you implemented it.

Did you for example write a windows service that queried the database every hour then saved the results to the lucene.net index?

Best Answer

Yes, I've used it for exactly what you are describing. We had two services - one for read, and one for write, but only because we had multiple readers. I'm sure we could have done it with just one service (the writer) and embedded the reader in the web app and services.

I've used lucene.net as a general database indexer, so what I got back was basically DB id's (to indexed email messages), and I've also use it to get back enough info to populate search results or such without touching the database. It's worked great in both cases, tho the SQL can get a little slow, as you pretty much have to get an ID, select an ID etc. We got around this by making a temp table (with just the ID row in it) and bulk-inserting from a file (which was the output from lucene) then joining to the message table. Was a lot quicker.

Lucene isn't perfect, and you do have to think a little outside the relational database box, because it TOTALLY isn't one, but it's very very good at what it does. Worth a look, and, I'm told, doesn't have the "oops, sorry, you need to rebuild your index again" problems that MS SQL's FTI does.

BTW, we were dealing with 20-50million emails (and around 1 million unique attachments), totaling about 20GB of lucene index I think, and 250+GB of SQL database + attachments.

Performance was fantastic, to say the least - just make sure you think about, and tweak, your merge factors (when it merges index segments). There is no issue in having more than one segment, but there can be a BIG problem if you try to merge two segments which have 1mil items in each, and you have a watcher thread which kills the process if it takes too long..... (yes, that kicked our arse for a while). So keep the max number of documents per thinggie LOW (ie, dont set it to maxint like we did!)

EDIT Corey Trager documented how to use Lucene.NET in BugTracker.NET here.