Install django-haystack
This part is easy – django-haystack is just a python module.
pip install django-haystack
Install OpenJDK and Solr
Solr is a Java program – we need the Java Runtime Environment to run our Solr server.
I’ll be using OpenJDK and solr-jetty on Ubuntu.
apt-get install openjdk-6-jre jetty solr-jetty
Find and configure solr
$ find / -name "solr" /var/lib/solr /var/lib/jetty/webapps/solr /usr/share/solr /etc/solr
Now that we know our solr config files live in /etc/solr/conf, have django-haystack generate the solr schema.xml file.
python manage.py build_solr_schema > /etc/solr/conf/schema.xml
Tweak solrconfig.xml
I’ll need to enable the MoreLikeThis handler by adding “” to the solrconfig.xml file.
Modify Jetty to run on port 8983 – default solr port
My jetty defaulted to port 8080 – a port I commonly use for other projects (nginx -> apache). Also, the default solr port is 8983 which my development environment uses. We’ll need to update jetty to use a different port.
I found the config file by searching for the jetty folder (/etc/jetty/jetty.xml) – modify the “port” line to whatever port you have set in your django settings HAYSTACK_SOLR_URL
<Set name="port"><SystemProperty name="jetty.port" default="8983"/></Set>
Start jetty
Navigate to your jetty folder, and run.
$ java -jar /usr/share/jetty/start.jar
Build your solr indexes
Now we need some data to search against.
Assuming you’ve set up django-haystack, run the rebuild_index management command.
$ python manage.py rebuild_index
Set up cron job to rebuild indexes however often you need
$ crontab -e $ 0 0 * * * python manage.py update_index --age=24 --remove
Hi yuji, Thank you very much for the wonderful post.
Helping so many of us.
I have a doubt regarding the last command that you posted: “python manage.py update_index –age=24 –remove”
1) Do we have to restart solr for every 24 hours because index is updating every 24 hours?
2) What is the use of –remove?