Tilt-logo

Things I Learned Today - our daily eureka-moments

Carr
proppi
Edland
feenyx
Programming .NET VS2010 Web Windows VS2012 S3 Search SQL SqlMetal Accessibility Amazon Android App BBQ EBS EC2 Exchange Food Garmin Geocaching GPS Grill Java Linq Lucene MVC PowerShell

Full-text searching with Lucene.NET - Carr, 16.11.2010

Data search is a recurring topic in software development.

In most of my .NET web applications, there has been a need for doing some kind of full-text searching in a database. For the longest time, I figured there were 4 options:


1) Your database is nice and small and, doing wildcard SQL queries perfoms well enough. (.. WHERE content LIKE '%query%')
1) Your database is big, you start fiddling with things like SQL server's crappy built-in full-text indexing, and your app is suddenly dependent on specific database features
2) Your database is huge, you need to buy an expensive black box solution (FAST ESP, Intellisearch, etc), and feed it Or, grab an open source solution like Solr and spend lots of time figuring out a platform you're not familiar with
3) Let google handle it.


When it finally occured to me to check for other options, i discovered Lucene.NET. Lucene is an information retrieval library which several search platforms are built on, but the kicker in my case was that it came packaged as a single .NET library.
One DLL. Free. For my evil Microsoft-platform.

My project contained a user forum with hundreds of thousands of posts, and this suddenly let me do full text searching across them in milliseconds. Did I mention it's easy to use?



To do a full search index rebuild, first initialize a Lucene instance:


Dim dir As Store.Directory = Store.FSDirectory.Open(New DirectoryInfo(Path_to_index_files))
Dim an As New Analysis.Standard.StandardAnalyzer(Util.Version.LUCENE_29)
Dim iw As New Index.IndexWriter(dir, an, True, Index.IndexWriter.MaxFieldLength.UNLIMITED)



Feed it each of your objects, known as documents. In my case, each document is a forum post:


doc.Add(New Documents.Field("id", p.ID, Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.YES))
doc.Add(New Documents.Field("title", p.Name, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES))
doc.Add(New Documents.Field("data", p.Message, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES))
iw.AddDocument(doc)



All done? Digest the data and shut down:


iw.Optimize()
iw.Close()




There. You now have a complete searchable index which can also be queried and tested with standalone tools like Luke.

Now, let's search all post titles and contents for a given query. Grab your lucene instance from above, build your query, grab 25 results, and handle them:


Dim queries() As String = {YourQuery, YourQuery}
Dim fields() As String = {"title", "data"}
Dim flags() As Search.BooleanClause.Occur = {Occur.SHOULD, Occur.SHOULD}

Dim parser As New QueryParsers.MultiFieldQueryParser(Util.Version.LUCENE_29, fields, an)
Dim query As Search.Query = q.Parse(Util.Version.LUCENE_29, queries, fields, flags, an)

Dim results As Search.TopDocs = query.Search(qOut, 25)

For Each doc As Search.ScoreDoc In results.scoreDocs
	' Do something with your result
Next





This is just the tip of the iceberg. There's support for all kinds of advanced weighting, sorting, stemming, ranging over various data types, realtime index updates, and much more. It's also not limited to databases, and can be fed data from any source. In some cases, Lucene can replace your database entirely.

Except for dropping in that DLL, your single external dependency is a folder on the server containing the index.

Magic.
Tags: .NET Programming Search Web Lucene
Comments:
Nobody has commented on this post yet. Feel free to be the first!