Blog Hero Image

Implementing Search Suggest with Apache Solr Part 2

Oct 05, 2009
Oscar Bernal
Oscar Bernal

In

Stopwords in Solr are words ((such as prepositions) that are configured to be ignored by the index and query analyzers. The words are configured inside a file called stopwords.txt. This is a useful feature since it mimics any other search engine functionality (like google) when performing searches. For example, in google when searching for "peter pan and captain hook" the word AND is ignored in order to bring more relevant results, otherwise any document containing the word "and" would match the search as well. When the stop filter is applied in Solr any search made will show the same behavior. However when implementing a search suggest functionality this behavior becomes undesired. "Pirates of the caribbean" is indexed in Solr as NOT containing the words "of" and "the" and so when performing the search above, for which you would expect correct results:

title:Pirates AND (title:O* OR title:O)

Nothing is returned by Solr. In order to obtain your results, just remove the stop filter from the field used for the search suggest query. For this case the best solution is to create a new field that will be used for the search suggest with an analyzer that does NOT use the stop filter. Here's the final solution in schema.xml and now search suggest works as expected.

<fieldType name="text_no_stop" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/-->
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <!--filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/-->
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

*Notice the stop filter commented out

<fields>
<field name="titlesuggest" type="text_no_stop" indexed="true" stored="true"/>
</fields>

You now have a fully functional search suggest feature working on your site!

I hope these blog entries were useful to anyone out there trying to implement a similar feature. Please keep in mind everything discussed here was done with Solr 1.2 which was the latest version available at the time. Solr is currently at 1.3 version and I know it's full of new features so if anyone has stumbled across an easier way to do this with Solr 1.3 (or even 1.2) please feel free to share by adding a comment!

Latest insights