Php – Lucene – Zend_Search_Lucene – how to build an index for “tagged”content

PHPzend-frameworkzend-search-lucene

I have following problem, I need to build lucene index for articles which are tagged.

Here is simplified data structure and lucene proposal:

article_id -> unindexed
article_title -> UnStored
article_content -> UnStored
article_tags -> ????? (here is the problem)

So article can have multiple tags. Lets say we have an article A which has following tags: T1,T2,T3. Problem is that T1,T2,T3 are represented by ID (number), I can't store its in index as text representation because it can be changed (then I would need to rebuild index searching all articles with the tag which has been changed remove and add them to index one more time). Then I need to search within articles with T1 and T2 tag. Number of tags assigned to the article is unlimited (relation 1-n). Is there any possibility to search over articles with certain tags (tag ids)?

Hope I am clear. Does anybody have efficient solution for this problem?

Thanks in advance.

Best Answer

You can do this with Lucene. One way is to create a document for each tag-article pair, and search for the tags using AND.

Should you use Lucene? I am unsure. In your description you do not use any full-text search capability. Why not use a database? I suggest you read Search Engine versus DBMS and choose according to the criteria defined there.

Related Topic