Apache Solr and Drupal - Part II: How to set up Drupal and Solr to search in attachments
What do you do when you need to search in files as well? For a recent project I had to enable users to search the content of attached files mainly in .pdf format. The Apache Solr with Tika seemed to be a good solution.
This guide is based on Drupal 7 with Search API 7.x-1.1.3, Solr search 7.x-1.6, Search views 7.x-1.13, Search API attachments 7.x-1.4 and Views 7.x-3.8
- Install Solr
If you haven't installed Solr yet, check our blog post how to set up easily a basic Solr service on your *nix system or read the official instructions how you can do it.
- Install Drupal modules
Install and enable the following modules:
- Download tika
Download the Tika app .jar file (tika-app-1.6.jar as per the time of this post), and copy it to
, or in case you build your site as an install profile, copy it to
. Be sure that you have the java JDK installed. If you use Ubuntu like I do, you can read here the "Installing default JRE/JDK" section for further info. Important: Once you have downloaded the .jar file, you may need to adjust its permissions.
- Set up Search API
Once you are done go to
- Add a new server.
- Add an index to the newly created server.
- On Filters tab enable File attachments:
- On Fields tab select the desired fields to be indexed:
- Open the Search API Attachments tab, select Tika Extraction method and fill in the Tika Extraction Settings section. Save configuration when you are done.
- Create a view Go to
and add a new view.
- From Show list select your index created earlier.
- For display format you might select rendered entity.
- Click Continue & edit.
- At Filter Criteria section set up your filter.
- Check "Expose this filter to visitors" and select the Searched fields from the list.
And basically that's it. Congratulations, you have just set up a full text search for attached files.
Today most of the websites have search functionality. With the help of Apache Solr the time spent on waiting for a search result can be radically reduced. In this article we are going to set up a basic searching infrastructure on a *nix-based system.