Implementing Search Suggest with Apache Solr Part 2

Oct 05, 2009
Oscar Bernal

In part 1 of this blog I showed you how to build a lucene query for Solr to implement search suggest in your website. This part will show you why you may be having problems with getting such a query to return the results you are expecting. The reason will most likely have to do with "stopwords" and the stop filter applied to the default index analyzer configuration in Solr.

Stopwords in Solr are words ((such as prepositions) that are configured to be ignored by the index and query analyzers. The words are configured inside a file called stopwords.txt. This is a useful feature since it mimics any other search engine functionality (like google) when performing searches. For example, in google when searching for "peter pan and captain hook" the word AND is ignored in order to bring more relevant results, otherwise any document containing the word "and" would match the search as well. When the stop filter is applied in Solr any search made will show the same behavior. However when implementing a search suggest functionality this behavior becomes undesired. "Pirates of the caribbean" is indexed in Solr as NOT containing the words "of" and "the" and so when performing the search above, for which you would expect correct results:

title:Pirates AND (title:O* OR title:O)

nothing is returned by Solr. In order to  obtain  your results ,just remove the stop filter from  the field used for the search suggest query. For this case the best solution is to create a new field that will be used for the search suggest with an analyzer that does NOT use the stop filter. Here's the final solution in schema.xml and now search suggest works as expected.

<fieldType name="text_no_stop" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/-->
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <!--filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/-->
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>


*Notice the stop filter commented out

<fields>
<field name="titlesuggest" type="text_no_stop" indexed="true" stored="true"/>
</fields>


You now have a fully functional search suggest feature working on your site!
I hope these blog entries were useful to anyone out there trying to implement a similar feature. Please keep in mind everything discussed here was done with Solr 1.2 which was the latest version available at the time. Solr is currently at 1.3 version and I know it's full of new features so if anyone has stumbled across an easier way to do this with Solr 1.3 (or even 1.2) please feel free to share by adding a comment!