annotate content/Coding/026-haystack-safe-tip.rst @ 13:bcfe2a2c8358

Take advantages of new pelican-bootstrap3 features. Show date & tags on index. Show twitter widget. The Bootstrap readable theme was updated. I didn't like the new version as much so I saved it as 'readable-bgn' in my pelican-bootstrap3 repo. Added a setting PATH = 'content' to prevent weird errors when using 'fab regenerate', etc. Got this by googling.
author Brian Neal <bgneal@gmail.com>
date Mon, 10 Feb 2014 20:03:21 -0600
parents 49bebfa6f9d3
children
rev   line source
bgneal@4 1 Haystack Search Quoting Issue
bgneal@4 2 #############################
bgneal@4 3
bgneal@4 4 :date: 2013-09-21 11:45
bgneal@4 5 :tags: Django, Haystack
bgneal@4 6 :slug: haystack-search-quoting-issue
bgneal@4 7 :author: Brian Neal
bgneal@7 8 :summary: I use the awesome Haystack_ search framework for my Django_ powered website. I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details.
bgneal@4 9
bgneal@4 10 The case of the missing forum topic
bgneal@4 11 ===================================
bgneal@4 12
bgneal@4 13 I use the awesome Haystack_ search framework for my Django_ powered website.
bgneal@4 14 I have found Haystack to be a huge win. It is easy to setup, configure, and
bgneal@4 15 customize when you have to. As someone who doesn't know very much about the
bgneal@4 16 world of searching, I'm grateful to have a powerful tool that just works
bgneal@4 17 without me having to get too involved in arcane details.
bgneal@4 18
bgneal@4 19 One day one of our users noticed that he could not find a forum topic with the
bgneal@4 20 title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
bgneal@4 21 topic would turn up if you searched for *sounding* or *chords*. But no
bgneal@4 22 combination of *Hawaiian*, with or without quotes would uncover this topic.
bgneal@4 23
bgneal@4 24 I should mention I am using the `Xapian backend`_. I know the backend tries to
bgneal@4 25 remove puncuation and special characters to create uniform searches. But
bgneal@4 26 I could not figure out where this was getting dropped at. After a bit of
bgneal@4 27 searching online, I found a few hints which led to the solution.
bgneal@4 28
bgneal@4 29 Safety versus correctness
bgneal@4 30 =========================
bgneal@4 31
bgneal@4 32 As suggested in the documentation, I am using templates to build the document
bgneal@4 33 used for the search engine. My template for forum topics looked like this::
bgneal@4 34
bgneal@4 35 {{ object.name }}
bgneal@4 36 {{ object.user.username }}
bgneal@4 37 {{ object.user.get_full_name }}
bgneal@4 38
bgneal@4 39 A mailing list post from another user suggested the problem. Django by default
bgneal@4 40 escapes text in templates. Thus the forum topic title::
bgneal@4 41
bgneal@4 42 "Hawaiian" sounding chords
bgneal@4 43
bgneal@4 44 was being turned into this by the Django templating engine::
bgneal@4 45
bgneal@4 46 &quot;Hawaiian&quot; sounding chords
bgneal@4 47
bgneal@4 48 Now what Haystack and/or the Xapian backend were doing with
bgneal@4 49 ``&quot;Hawaiian&quot;`` I have no idea. I tried searching for this unusual
bgneal@4 50 term but it did not turn up any results. Apparently it is just getting dropped.
bgneal@4 51
bgneal@4 52 The solution was to modify the template to this::
bgneal@4 53
bgneal@4 54 {{ object.name|safe }}
bgneal@4 55 {{ object.user.username|safe }}
bgneal@4 56 {{ object.user.get_full_name|safe }}
bgneal@4 57
bgneal@4 58 But is it safe?
bgneal@4 59 ===============
bgneal@4 60
bgneal@4 61 After changing my template and rebuilding the index, the troublesome topic was
bgneal@4 62 then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
bgneal@4 63 supplied content now show up unescaped in the search results? Well I can't
bgneal@4 64 answer this authoritatively but I did spend a fair amount of time experimenting
bgneal@4 65 with this. I'm using Haystack's ``highlight`` template tag, and my users' input
bgneal@4 66 is done in Markdown_, and I could not inject malicious text into the search
bgneal@4 67 results. You should test this yourself on your site.
bgneal@4 68
bgneal@4 69 Conclusion
bgneal@4 70 ==========
bgneal@4 71
bgneal@4 72 This turned out to be a simple fix and I hope it helps someone else. I will
bgneal@4 73 make enquiries to see if this should be added to the Haystack documentation.
bgneal@4 74
bgneal@4 75 .. _Haystack: http://haystacksearch.org/
bgneal@4 76 .. _Django: https://www.djangoproject.com/
bgneal@4 77 .. _Xapian backend: https://github.com/notanumber/xapian-haystack
bgneal@4 78 .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
bgneal@4 79 .. _Markdown: http://daringfireball.net/projects/markdown/