Search Engine Deployment and Configuration

The Content Store's search functionality is provided by the Java search engine Apache Solr, which runs as a web application. The standard Content Store installation includes a solr web application and an associated indexer web application that indexes the content of all CUE publications. These two applications are deployed along with the Content Store by the ece script's deploy action.

The result of following the basic installation procedure described in Installation Procedure, therefore, is that a solr instance and indexer web application is deployed on every engine host in your installation, all with identical configurations.

This set up will work, but it is relatively inefficient and is unlikely to work well in a production environment. There are two main reasons for this:

Solr memory usage

solr and the indexer can at times consume large amounts of memory and trigger large garbage collection operations in the JVM, which has severe effects on Content Store performance. They should therefore not be run in the same JVM as the Content Store on production systems. solr already runs in its own webapp container (and therefore in a different JVM), but the indexer is deployed to the same Tomcat instance as the Content Store. The simplest way to achieve this separation on a single-host installation is to move the indexer webapp to a separate Tomcat instance. For more about this, see Isolating The Search Engine.

Solr stemming

In the default solr configuration, English stemming is enabled by default. This means that searching non-English content might give unexpected results.

If your content is in a language other than English, you should either disable stemming or modify the configuration to suit your language.

To disable stemming, remove the following line from schema.xml:

<filter class="solr.SnowballPorterFilterFactory" protected="protwords.txt" language="English"/>

Disabling stemming will improve solr's performance.

For information about how to configure stemming for other languages, see the Solr documentation on http://lucene.apache.org/solr/.

Solr optimization issues

The default solr configuration is optimized for editorial purposes: it indexes all the fields needed to support the search functionality provided by CUE, resulting in very large indexes. This is acceptable in the editorial context, since the number of concurrent CUE users, even in a very large organisation, is not likely to be very large. The presentation hosts in a large CUE installation, however, can be required to serve many thousands of concurrent users, and the default solr configuration may perform poorly in this context.

The default configuration, therefore, is fine for the editorial hosts in a production system, but for the presentation hosts you are recommended create a custom indexer configuration that only indexes the fields actually needed to support the kinds of search required in your publications.

To do this, open /var/lib/escenic/solr-core/schema.xml for editing on each of your presentation hosts, and modify the index schema to meet your requirements. Editing this file is outside the scope of this manual. In order to tune the search engine you need to take account of the both the contents of your publications, your users' needs with regards to search and the limitations imposed by your particular hardware configuration. For further information and advice on tuning, see the Solr documentation on http://lucene.apache.org/solr/.

There are many more changes you can make to your search engine set-up in order to optimize it for your particular needs. For a discussion of the general principles involved, see Search Engine Configuration and Management.