annotate content/Coding/026-haystack-safe-tip.rst @ 5:4b5cdcc351c5

Use a cloned copy of pelican-bootstrap3 repo as my theme.
author Brian Neal <bgneal@gmail.com>
date Fri, 31 Jan 2014 19:12:50 -0600
parents 7ce6393e6d30
children 49bebfa6f9d3
rev   line source
bgneal@4 1 Haystack Search Quoting Issue
bgneal@4 2 #############################
bgneal@4 3
bgneal@4 4 :date: 2013-09-21 11:45
bgneal@4 5 :tags: Django, Haystack
bgneal@4 6 :slug: haystack-search-quoting-issue
bgneal@4 7 :author: Brian Neal
bgneal@4 8
bgneal@4 9 The case of the missing forum topic
bgneal@4 10 ===================================
bgneal@4 11
bgneal@4 12 I use the awesome Haystack_ search framework for my Django_ powered website.
bgneal@4 13 I have found Haystack to be a huge win. It is easy to setup, configure, and
bgneal@4 14 customize when you have to. As someone who doesn't know very much about the
bgneal@4 15 world of searching, I'm grateful to have a powerful tool that just works
bgneal@4 16 without me having to get too involved in arcane details.
bgneal@4 17
bgneal@4 18 One day one of our users noticed that he could not find a forum topic with the
bgneal@4 19 title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
bgneal@4 20 topic would turn up if you searched for *sounding* or *chords*. But no
bgneal@4 21 combination of *Hawaiian*, with or without quotes would uncover this topic.
bgneal@4 22
bgneal@4 23 I should mention I am using the `Xapian backend`_. I know the backend tries to
bgneal@4 24 remove puncuation and special characters to create uniform searches. But
bgneal@4 25 I could not figure out where this was getting dropped at. After a bit of
bgneal@4 26 searching online, I found a few hints which led to the solution.
bgneal@4 27
bgneal@4 28 Safety versus correctness
bgneal@4 29 =========================
bgneal@4 30
bgneal@4 31 As suggested in the documentation, I am using templates to build the document
bgneal@4 32 used for the search engine. My template for forum topics looked like this::
bgneal@4 33
bgneal@4 34 {{ object.name }}
bgneal@4 35 {{ object.user.username }}
bgneal@4 36 {{ object.user.get_full_name }}
bgneal@4 37
bgneal@4 38 A mailing list post from another user suggested the problem. Django by default
bgneal@4 39 escapes text in templates. Thus the forum topic title::
bgneal@4 40
bgneal@4 41 "Hawaiian" sounding chords
bgneal@4 42
bgneal@4 43 was being turned into this by the Django templating engine::
bgneal@4 44
bgneal@4 45 &quot;Hawaiian&quot; sounding chords
bgneal@4 46
bgneal@4 47 Now what Haystack and/or the Xapian backend were doing with
bgneal@4 48 ``&quot;Hawaiian&quot;`` I have no idea. I tried searching for this unusual
bgneal@4 49 term but it did not turn up any results. Apparently it is just getting dropped.
bgneal@4 50
bgneal@4 51 The solution was to modify the template to this::
bgneal@4 52
bgneal@4 53 {{ object.name|safe }}
bgneal@4 54 {{ object.user.username|safe }}
bgneal@4 55 {{ object.user.get_full_name|safe }}
bgneal@4 56
bgneal@4 57 But is it safe?
bgneal@4 58 ===============
bgneal@4 59
bgneal@4 60 After changing my template and rebuilding the index, the troublesome topic was
bgneal@4 61 then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
bgneal@4 62 supplied content now show up unescaped in the search results? Well I can't
bgneal@4 63 answer this authoritatively but I did spend a fair amount of time experimenting
bgneal@4 64 with this. I'm using Haystack's ``highlight`` template tag, and my users' input
bgneal@4 65 is done in Markdown_, and I could not inject malicious text into the search
bgneal@4 66 results. You should test this yourself on your site.
bgneal@4 67
bgneal@4 68 Conclusion
bgneal@4 69 ==========
bgneal@4 70
bgneal@4 71 This turned out to be a simple fix and I hope it helps someone else. I will
bgneal@4 72 make enquiries to see if this should be added to the Haystack documentation.
bgneal@4 73
bgneal@4 74 .. _Haystack: http://haystacksearch.org/
bgneal@4 75 .. _Django: https://www.djangoproject.com/
bgneal@4 76 .. _Xapian backend: https://github.com/notanumber/xapian-haystack
bgneal@4 77 .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
bgneal@4 78 .. _Markdown: http://daringfireball.net/projects/markdown/