Mercurial > public > pelican-blog
view content/Coding/026-haystack-safe-tip.rst @ 7:49bebfa6f9d3
Added summary lines for those posts that had back to back headings.
Those posts didn't look so great on Pelican's index.
author | Brian Neal <bgneal@gmail.com> |
---|---|
date | Fri, 31 Jan 2014 20:32:36 -0600 |
parents | 7ce6393e6d30 |
children |
line wrap: on
line source
Haystack Search Quoting Issue ############################# :date: 2013-09-21 11:45 :tags: Django, Haystack :slug: haystack-search-quoting-issue :author: Brian Neal :summary: I use the awesome Haystack_ search framework for my Django_ powered website. I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details. The case of the missing forum topic =================================== I use the awesome Haystack_ search framework for my Django_ powered website. I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details. One day one of our users noticed that he could not find a forum topic with the title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The topic would turn up if you searched for *sounding* or *chords*. But no combination of *Hawaiian*, with or without quotes would uncover this topic. I should mention I am using the `Xapian backend`_. I know the backend tries to remove puncuation and special characters to create uniform searches. But I could not figure out where this was getting dropped at. After a bit of searching online, I found a few hints which led to the solution. Safety versus correctness ========================= As suggested in the documentation, I am using templates to build the document used for the search engine. My template for forum topics looked like this:: {{ object.name }} {{ object.user.username }} {{ object.user.get_full_name }} A mailing list post from another user suggested the problem. Django by default escapes text in templates. Thus the forum topic title:: "Hawaiian" sounding chords was being turned into this by the Django templating engine:: "Hawaiian" sounding chords Now what Haystack and/or the Xapian backend were doing with ``"Hawaiian"`` I have no idea. I tried searching for this unusual term but it did not turn up any results. Apparently it is just getting dropped. The solution was to modify the template to this:: {{ object.name|safe }} {{ object.user.username|safe }} {{ object.user.get_full_name|safe }} But is it safe? =============== After changing my template and rebuilding the index, the troublesome topic was then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user supplied content now show up unescaped in the search results? Well I can't answer this authoritatively but I did spend a fair amount of time experimenting with this. I'm using Haystack's ``highlight`` template tag, and my users' input is done in Markdown_, and I could not inject malicious text into the search results. You should test this yourself on your site. Conclusion ========== This turned out to be a simple fix and I hope it helps someone else. I will make enquiries to see if this should be added to the Haystack documentation. .. _Haystack: http://haystacksearch.org/ .. _Django: https://www.djangoproject.com/ .. _Xapian backend: https://github.com/notanumber/xapian-haystack .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting .. _Markdown: http://daringfireball.net/projects/markdown/