Liferay has extensive web content management capabilities.  It can easily scale to present millions of pieces of content via the web.  But, how do your users find what they want?

Enterprise search can be a mystery to the uninitiated.  Often it becomes a guessing game or an exercise in “hit and miss” approaches.  And if users search site content, they rarely look past the first screen of search results.

Let’s do a quick run down of the indexing process and how it works. Liferay works with multiple search products, but the one that we at XTIVIA favor is Apache SOLR.  When administrators, content creators or users create content, Apache SOLR indexes that content.

For content created in the English language, this is that process.  The articles “a”,”an” and “the” are stripped from the content text.  SOLR then views the remaining words.  These are called “keywords”.

It looks at occurences of a particular word.  It finds the variations of the word.  (“France”, “French”, “Francais”)  In search, this is called “stemming”.

It also looks at how close these occurences of the keyword and its stems appear in the content.  Are they in the same sentence?  Are they in the same paragraph?  Are they in the same piece of content?  This is called the keyword “proximity”.

When a user enters a search term, SOLR queries this index.  Let’s say someone is looking for a recipe on cooking and preparing home-made french fries.  The user enters “french fries” in a search field.

SOLR splits this query into the two search terms.  It then queries its index for content pieces with the highest occurences of the keywords “french” and “fries”.

If any pieces of content contain stems of the keywords in close proximity, they are considered the strongest matches.  This is called “weighting”.  The highest weighted content is presented first.

Let’s add together what we have covered to try and see if we give our user the recipe they are looking for.  In order for their recipe to appear high in their search results, the following must happen.  The variations of the keywords must be in high frequency. The appearances of keyword stems must also be in close proximity.

Let’s explain this in plain English.  If your recipe mentions “french” and “fries” more often and more closely than other content pieces about making a french souffle and deep frying a turkey for Thanksgiving, your portal is the recipe they are looking for.