bgneal@4: Haystack Search Quoting Issue bgneal@4: ############################# bgneal@4: bgneal@4: :date: 2013-09-21 11:45 bgneal@4: :tags: Django, Haystack bgneal@4: :slug: haystack-search-quoting-issue bgneal@4: :author: Brian Neal bgneal@4: bgneal@4: The case of the missing forum topic bgneal@4: =================================== bgneal@4: bgneal@4: I use the awesome Haystack_ search framework for my Django_ powered website. bgneal@4: I have found Haystack to be a huge win. It is easy to setup, configure, and bgneal@4: customize when you have to. As someone who doesn't know very much about the bgneal@4: world of searching, I'm grateful to have a powerful tool that just works bgneal@4: without me having to get too involved in arcane details. bgneal@4: bgneal@4: One day one of our users noticed that he could not find a forum topic with the bgneal@4: title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The bgneal@4: topic would turn up if you searched for *sounding* or *chords*. But no bgneal@4: combination of *Hawaiian*, with or without quotes would uncover this topic. bgneal@4: bgneal@4: I should mention I am using the `Xapian backend`_. I know the backend tries to bgneal@4: remove puncuation and special characters to create uniform searches. But bgneal@4: I could not figure out where this was getting dropped at. After a bit of bgneal@4: searching online, I found a few hints which led to the solution. bgneal@4: bgneal@4: Safety versus correctness bgneal@4: ========================= bgneal@4: bgneal@4: As suggested in the documentation, I am using templates to build the document bgneal@4: used for the search engine. My template for forum topics looked like this:: bgneal@4: bgneal@4: {{ object.name }} bgneal@4: {{ object.user.username }} bgneal@4: {{ object.user.get_full_name }} bgneal@4: bgneal@4: A mailing list post from another user suggested the problem. Django by default bgneal@4: escapes text in templates. Thus the forum topic title:: bgneal@4: bgneal@4: "Hawaiian" sounding chords bgneal@4: bgneal@4: was being turned into this by the Django templating engine:: bgneal@4: bgneal@4: "Hawaiian" sounding chords bgneal@4: bgneal@4: Now what Haystack and/or the Xapian backend were doing with bgneal@4: ``"Hawaiian"`` I have no idea. I tried searching for this unusual bgneal@4: term but it did not turn up any results. Apparently it is just getting dropped. bgneal@4: bgneal@4: The solution was to modify the template to this:: bgneal@4: bgneal@4: {{ object.name|safe }} bgneal@4: {{ object.user.username|safe }} bgneal@4: {{ object.user.get_full_name|safe }} bgneal@4: bgneal@4: But is it safe? bgneal@4: =============== bgneal@4: bgneal@4: After changing my template and rebuilding the index, the troublesome topic was bgneal@4: then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user bgneal@4: supplied content now show up unescaped in the search results? Well I can't bgneal@4: answer this authoritatively but I did spend a fair amount of time experimenting bgneal@4: with this. I'm using Haystack's ``highlight`` template tag, and my users' input bgneal@4: is done in Markdown_, and I could not inject malicious text into the search bgneal@4: results. You should test this yourself on your site. bgneal@4: bgneal@4: Conclusion bgneal@4: ========== bgneal@4: bgneal@4: This turned out to be a simple fix and I hope it helps someone else. I will bgneal@4: make enquiries to see if this should be added to the Haystack documentation. bgneal@4: bgneal@4: .. _Haystack: http://haystacksearch.org/ bgneal@4: .. _Django: https://www.djangoproject.com/ bgneal@4: .. _Xapian backend: https://github.com/notanumber/xapian-haystack bgneal@4: .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting bgneal@4: .. _Markdown: http://daringfireball.net/projects/markdown/