Mercurial > public > pelican-blog
diff content/Coding/026-haystack-safe-tip.rst @ 4:7ce6393e6d30
Adding converted blog posts from old blog.
author | Brian Neal <bgneal@gmail.com> |
---|---|
date | Thu, 30 Jan 2014 21:45:03 -0600 |
parents | |
children | 49bebfa6f9d3 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/Coding/026-haystack-safe-tip.rst Thu Jan 30 21:45:03 2014 -0600 @@ -0,0 +1,78 @@ +Haystack Search Quoting Issue +############################# + +:date: 2013-09-21 11:45 +:tags: Django, Haystack +:slug: haystack-search-quoting-issue +:author: Brian Neal + +The case of the missing forum topic +=================================== + +I use the awesome Haystack_ search framework for my Django_ powered website. +I have found Haystack to be a huge win. It is easy to setup, configure, and +customize when you have to. As someone who doesn't know very much about the +world of searching, I'm grateful to have a powerful tool that just works +without me having to get too involved in arcane details. + +One day one of our users noticed that he could not find a forum topic with the +title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The +topic would turn up if you searched for *sounding* or *chords*. But no +combination of *Hawaiian*, with or without quotes would uncover this topic. + +I should mention I am using the `Xapian backend`_. I know the backend tries to +remove puncuation and special characters to create uniform searches. But +I could not figure out where this was getting dropped at. After a bit of +searching online, I found a few hints which led to the solution. + +Safety versus correctness +========================= + +As suggested in the documentation, I am using templates to build the document +used for the search engine. My template for forum topics looked like this:: + + {{ object.name }} + {{ object.user.username }} + {{ object.user.get_full_name }} + +A mailing list post from another user suggested the problem. Django by default +escapes text in templates. Thus the forum topic title:: + + "Hawaiian" sounding chords + +was being turned into this by the Django templating engine:: + + "Hawaiian" sounding chords + +Now what Haystack and/or the Xapian backend were doing with +``"Hawaiian"`` I have no idea. I tried searching for this unusual +term but it did not turn up any results. Apparently it is just getting dropped. + +The solution was to modify the template to this:: + + {{ object.name|safe }} + {{ object.user.username|safe }} + {{ object.user.get_full_name|safe }} + +But is it safe? +=============== + +After changing my template and rebuilding the index, the troublesome topic was +then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user +supplied content now show up unescaped in the search results? Well I can't +answer this authoritatively but I did spend a fair amount of time experimenting +with this. I'm using Haystack's ``highlight`` template tag, and my users' input +is done in Markdown_, and I could not inject malicious text into the search +results. You should test this yourself on your site. + +Conclusion +========== + +This turned out to be a simple fix and I hope it helps someone else. I will +make enquiries to see if this should be added to the Haystack documentation. + +.. _Haystack: http://haystacksearch.org/ +.. _Django: https://www.djangoproject.com/ +.. _Xapian backend: https://github.com/notanumber/xapian-haystack +.. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting +.. _Markdown: http://daringfireball.net/projects/markdown/