Remote Software Engineer at Stripe and cellist based out of Ontario. Previously at GitLab. Fascinated with building usable, delightful software.
June 26, 2017 | 6 minutes to read
tl;dr: I created a custom Solr filter that allows for natural date searching. Here’s the source.
Out of the box, Solr comes with some pretty powerful date-searching capabilities. For example, say you wanted to find records from the beginning of time until August 19, 1976. Easy:
my_date_field:[* TO 1976-08-20T00:00:00}
Or maybe you only want records from last week?
my_date_field:[NOW-7DAY/DAY TO NOW]
But what if you wanted to find all records with a date in the month of March? Or on a Tuesday? Or every July 4th?
Solr’s default date-searching abilities can’t handle specific queries like this. Fortunately, there are a couple of ways around this.
There’s no reason you can’t create more fields in your Solr core that present duplicate data in a friendlier format. After all, denormalization is the whole point of Solr!
Say you have a field like this:
<!-- stores a person's date of birth (DOB) -->
<field name="dob" type="pdate" indexed="true" stored="true" />
By adding a new field for each “chunk” of data we want to search:
<!-- stores a person's date of birth (DOB) -->
<field name="dob" type="pdate" indexed="true" stored="true" />
<field name="dob_day" type="pint" indexed="true" stored="true" />
<field name="dob_month" type="pint" indexed="true" stored="true" />
<field name="dob_year" type="pint" indexed="true" stored="true" />
<field name="dob_day_of_week" type="string" indexed="true" stored="true" />
…you’ll end up with a Solr core that can answer some scarily specific questions:
Fetch all people born on Christmas, when Christmas fell on a Sunday:
dob_month:12 AND dob_day:25 AND dob_day_of_week:sunday
Find everyone born on a Tuesday in July during the 70’s:
dob:[1970-01-01T:00:00:00Z TO 1980-01-01T:00:00:00Z} AND
dob_day_of_week:tuesday AND
dob_month:7
With raw querying power like this, it’s important to remember: it’s not whether or not you should, it’s whether or not you can.
This solution has a couple of drawbacks:
dob:tuesday
won’t work.This brings us to option #2…
If you’re new to Solr, this suggestion may seem a bit extreme, but Solr actually has very robust customization support. I won’t say that it’s easy - there are a lot of moving pieces, and familiarity with Java development is required - but the process is sane once you’ve climbed the learning curve.
The upside of this solution is that it provides near limitless flexibility. Custom filters allow you to intercept the indexing (or query analysis) process, giving you fine-grained control over how Solr breaks down your input into tokens.
Practically speaking, this means we can create a Solr filter that does all the preprocessing required in option #1 (breaking down dates into day, month, year, and day-of-the-week components) and includes this logic in Solr’s own indexing process!
Several months ago, I took the dive and created a custom Solr filter that does just that. Dates indexed using this filter can be searched using queries like dob:june
or dob: 06
. Without further ado, here’s the custom filter’s source code.
Here’s the general idea:
NfDateFilter
), Solr passes the string representation of the date (like “2018-06-26”) to my filter.Date
object.To use this filter, I add a reference to my custom filter’s .jar
file in my core’s solrconfig.xml
:
<config>
<lib path="${solr.install.dir:../../../..}/server/solr/cores/NfDateFilter.jar" />
</config>
… and define a Solr field in my core’s managed-schema
that uses this filter at index time:
<fieldType name="text_date" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="io.nathanfriend.solr.NfDateFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Now any field that uses the text_date
type can be searched in a natural, human way:
my_date_field:jan
yyyy-MM-dd
. If the filter encounters a date string that deviates from this format, it throws an exception.Some related links:
DateRangeField
type and the non-range date types (TrieDateField
and DatePointField
), as well as how to use curly brackets ({
and }
) in date range queriesOther posts you may enjoy:
November 25, 2024 | 10 minutes to read
October 14, 2024 | 3 minutes to read
May 31, 2024 | 6 minutes to read
June 26, 2023 | 14 minutes to read
January 25, 2022 | 6 minutes to read