Jahia like another power full java based CMS like Alfresco, uses industry standard Lucene open source index engine. Lucene is a java based search engine for applications that requires full-text search. This engine can perform basic and complex searches over the data inserted in the Jahia system.
If you are new with template development in Jahia and in the implementation of basic searches, I strongly recommend you read my Jahia posts “Jahia WCM Quick Review: Maven, Templates and Navigation”, “Jahia Search in the Enterprise”.
The metadata and keywords defined in the Jahia system can be indexed by Lucene in order to perform complex queries. In a Jahia installation, Lucene indexes are stored in the folder /WEB-INF/var/search_indexes/
and they can be explored with a Java tool named Luke. Also this tool can be used to perform test searches over the indexed data. The following picture shows what this tool looks like. All the indexed fields are in the left column, and in the right column are the values.
With this tool, we can know the name of the fields to use them in the search query.
Jahia has an administration Tool that permits you to manage the search engines. From here you can Re-index your Site in case that is needed. Like the following picture shows, you just need to click the button “Next step” to perform the full site re-index.
Metadata can be used to filter information in the queries and you can assign metadata values to pages or containers from the edit mode. Notice in the following picture that from this screen you can assign values to the metadata like Keywords, Categories and Description. The others metadata fields can’t be modified because they are read only elements.
Note: If you want to know the name on the metadata field names indexes you can use Luke.
Suppose that you already implemented the search form implemented in my Jahia post “Jahia Search in the Enterprise”. Now the idea is to create a custom query according to a given requirement. The idea is to create a weighted query where the score of the page is bigger if the searched terms are in the title page. So, if a page has only the searched terms in its content and not in its title, a page with the searched terms in the title is going to have more score.
f the term “Andromeda Galaxy” is searched, we need to transform that term to the query:
jahia.title:"Andromeda Galaxy"^9 OR (jahia.title:Andromeda^5) OR (jahia.title:Galaxy^5) OR (jahia.containerfield_my_templates_generictext_inserttext:Andromeda^1) OR (jahia.containerfield_my_templates_generictext_inserttext:Galaxy^1)
We are using the index jahia.title, this index contains the title of the pages, and the index jahia.containerfield_my_templates_generictext_inserttext is a specific field defined in the .cnd file that contains the text of the pages. In this query we are giving the weight of 9 when the complete phrase exists in the title, the weight of 5 is assigned if one term of the search is in the title, and the weight of 1 is assigned when the term is in the body.
This string manipulation can be done by creating a java function in the template set project. So, you can create your custom queries depending on your needs. For more information about the available queries operators, you can check here.
The following screen shows the results for this query. The first hit has the word “Andromeda Galaxy” in the title, the second hit has the word “Andromeda in its title”, and the rest have the word Galaxy or Andromeda in their body.