Artirix News


Solr Schema-less Indexing

December 2, 2011

Use the schema.xml, Luke!

Indexing into Solr needn’t be fixed to fields defined up front in schema.xml. Underneath Solr, Lucene is schemaless and fields can be added on a per document basis. Some may argue it beat NoSQL to flexible schemas by some years!

Using a less known feature ‘dynamicFields’ you can make use of this flexibility from Solr without having to get down to the Lucene library layer.

The trick is to store your fields with suffixes that indicate to Solr at indexing time how to treat the field.

By way of example, instead of defining a few conventional fields, such as:

<field name=”first_name” stored=”true” type=”string” multiValued=”false” indexed=”true”/>

<field name=”last_name” stored=”true” type=”string” multiValued=”false” indexed=”true”/>

You could have the single definition:

<dynamicField name=”*_ss” stored=”true” type=”string” multiValued=”false” indexed=”true”/>

Then, at index time you index the data with the fields named ‘first_name_ss’ and ‘last_name_ss’ and these will be correctly treated as if you’d defined them as the original types.

Other than restarting Solr (or reload just that core) the first time after defining this dynamicField, you don’t need to restart Solr again when mentioning new dynamic fields.

Another caveat is the field suffixes are stored – so must be used at query time (e.g. q=field_name_ss:Sam). This slight inelegance could be handled by wrapping the Solr API in order to¬†’unmap’ these suffixes, if necessary.

You can get smarter and define say _s as a string stored=”false” and indexed=”true”, and _ss as a string, both stored=”true” and indexed=”true”, so the flexibility of deciding whether a field is stored is determined at index time.

 

Happy indexing!