view content/Coding/026-haystack-safe-tip.rst @ 8:e29fd75628d6

Turn on disqus comments.
author Brian Neal <bgneal@gmail.com>
date Sat, 01 Feb 2014 13:58:51 -0600
parents 49bebfa6f9d3
children
line wrap: on
line source
Haystack Search Quoting Issue
#############################

:date: 2013-09-21 11:45
:tags: Django, Haystack
:slug: haystack-search-quoting-issue
:author: Brian Neal
:summary: I use the awesome Haystack_ search framework for my Django_ powered website.  I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details.

The case of the missing forum topic
===================================

I use the awesome Haystack_ search framework for my Django_ powered website.
I have found Haystack to be a huge win. It is easy to setup, configure, and
customize when you have to. As someone who doesn't know very much about the
world of searching, I'm grateful to have a powerful tool that just works
without me having to get too involved in arcane details.

One day one of our users noticed that he could not find a forum topic with the
title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
topic would turn up if you searched for *sounding* or *chords*. But no
combination of *Hawaiian*, with or without quotes would uncover this topic.

I should mention I am using the `Xapian backend`_. I know the backend tries to
remove puncuation and special characters to create uniform searches. But
I could not figure out where this was getting dropped at. After a bit of
searching online, I found a few hints which led to the solution.

Safety versus correctness
=========================

As suggested in the documentation, I am using templates to build the document
used for the search engine. My template for forum topics looked like this::

   {{ object.name }}
   {{ object.user.username }}
   {{ object.user.get_full_name }}

A mailing list post from another user suggested the problem. Django by default
escapes text in templates. Thus the forum topic title::

   "Hawaiian" sounding chords

was being turned into this by the Django templating engine::

   &quot;Hawaiian&quot; sounding chords

Now what Haystack and/or the Xapian backend were doing with
``&quot;Hawaiian&quot;`` I have no idea. I tried searching for this unusual
term but it did not turn up any results. Apparently it is just getting dropped.

The solution was to modify the template to this::
   
   {{ object.name|safe }}
   {{ object.user.username|safe }}
   {{ object.user.get_full_name|safe }}

But is it safe?
===============

After changing my template and rebuilding the index, the troublesome topic was
then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
supplied content now show up unescaped in the search results? Well I can't
answer this authoritatively but I did spend a fair amount of time experimenting
with this. I'm using Haystack's ``highlight`` template tag, and my users' input
is done in Markdown_, and I could not inject malicious text into the search
results. You should test this yourself on your site.

Conclusion
==========

This turned out to be a simple fix and I hope it helps someone else. I will
make enquiries to see if this should be added to the Haystack documentation.

.. _Haystack: http://haystacksearch.org/
.. _Django: https://www.djangoproject.com/
.. _Xapian backend: https://github.com/notanumber/xapian-haystack
.. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
.. _Markdown: http://daringfireball.net/projects/markdown/