Node.js – mongodb approximate string matching


I am trying to implement a search engine for my recipes-website using mongo db.
I am trying to display the search suggestions in type-ahead widget box to the users.

I am even trying to support mis-spelled queries(levenshtein distance).

For example: whenever users type 'pza', type-ahead should display 'pizza' as one of the suggestion.

How can I implement such functionality using mongodb?

Please note, the search should be instantaneous, since the search result will be fetched by type-ahead widget. The collections over which I would run search queries have at-most 1 million entries.

I thought of implementing levenshtein distance algorithm, but this would slow down performance, as collection is huge.

I read FTS(Full Text Search) in mongo 2.6 is quite stable now, but my requirement is Approximate match, not FTS. FTS won't return 'pza' for 'pizza'.

Please recommend me the efficient way.

I am using node js mongodb native driver.

Best Solution

The text search feature in MongoDB (as at 2.6) does not have any built-in features for fuzzy/partial string matching. As you've noted, the use case currently focuses on language & stemming support with basic boolean operators and word/phrase matching.

There are several possible approaches to consider for fuzzy matching depending on your requirements and how you want to qualify "efficient" (speed, storage, developer time, infrastructure required, etc):

  • Implement support for fuzzy/partial matching in your application logic using some of the readily available soundalike and similarity algorithms. Benefits of this approach include not having to add any extra infrastructure and being able to closely tune matching to your requirements.

    For some more detailed examples, see: Efficient Techniques for Fuzzy and Partial matching in MongoDB.

  • Integrate with an external search tool that provides more advanced search features. This adds some complexity to your deployment and is likely overkill just for typeahead, but you may find other search features you would like to incorporate elsewhere in your application (e.g. "like this", word proximity, faceted search, ..).

    For example see: How to Perform Fuzzy-Matching with Mongo Connector and Elastic Search. Note: ElasticSearch's fuzzy query is based on Levenshtein distance.

  • Use an autocomplete library like Twitter's open source typeahead.js, which includes a suggestion engine and query/caching API. Typeahead is actually complementary to any of the other backend approaches, and its (optional) suggestion engine Bloodhound supports prefetching as well as caching data in local storage.