diff content/Coding/026-haystack-safe-tip.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.
author Brian Neal <bgneal@gmail.com>
date Thu, 30 Jan 2014 21:45:03 -0600
parents
children 49bebfa6f9d3
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/content/Coding/026-haystack-safe-tip.rst	Thu Jan 30 21:45:03 2014 -0600
@@ -0,0 +1,78 @@
+Haystack Search Quoting Issue
+#############################
+
+:date: 2013-09-21 11:45
+:tags: Django, Haystack
+:slug: haystack-search-quoting-issue
+:author: Brian Neal
+
+The case of the missing forum topic
+===================================
+
+I use the awesome Haystack_ search framework for my Django_ powered website.
+I have found Haystack to be a huge win. It is easy to setup, configure, and
+customize when you have to. As someone who doesn't know very much about the
+world of searching, I'm grateful to have a powerful tool that just works
+without me having to get too involved in arcane details.
+
+One day one of our users noticed that he could not find a forum topic with the
+title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
+topic would turn up if you searched for *sounding* or *chords*. But no
+combination of *Hawaiian*, with or without quotes would uncover this topic.
+
+I should mention I am using the `Xapian backend`_. I know the backend tries to
+remove puncuation and special characters to create uniform searches. But
+I could not figure out where this was getting dropped at. After a bit of
+searching online, I found a few hints which led to the solution.
+
+Safety versus correctness
+=========================
+
+As suggested in the documentation, I am using templates to build the document
+used for the search engine. My template for forum topics looked like this::
+
+   {{ object.name }}
+   {{ object.user.username }}
+   {{ object.user.get_full_name }}
+
+A mailing list post from another user suggested the problem. Django by default
+escapes text in templates. Thus the forum topic title::
+
+   "Hawaiian" sounding chords
+
+was being turned into this by the Django templating engine::
+
+   &quot;Hawaiian&quot; sounding chords
+
+Now what Haystack and/or the Xapian backend were doing with
+``&quot;Hawaiian&quot;`` I have no idea. I tried searching for this unusual
+term but it did not turn up any results. Apparently it is just getting dropped.
+
+The solution was to modify the template to this::
+   
+   {{ object.name|safe }}
+   {{ object.user.username|safe }}
+   {{ object.user.get_full_name|safe }}
+
+But is it safe?
+===============
+
+After changing my template and rebuilding the index, the troublesome topic was
+then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
+supplied content now show up unescaped in the search results? Well I can't
+answer this authoritatively but I did spend a fair amount of time experimenting
+with this. I'm using Haystack's ``highlight`` template tag, and my users' input
+is done in Markdown_, and I could not inject malicious text into the search
+results. You should test this yourself on your site.
+
+Conclusion
+==========
+
+This turned out to be a simple fix and I hope it helps someone else. I will
+make enquiries to see if this should be added to the Haystack documentation.
+
+.. _Haystack: http://haystacksearch.org/
+.. _Django: https://www.djangoproject.com/
+.. _Xapian backend: https://github.com/notanumber/xapian-haystack
+.. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
+.. _Markdown: http://daringfireball.net/projects/markdown/