bgneal@4
|
1 Haystack Search Quoting Issue
|
bgneal@4
|
2 #############################
|
bgneal@4
|
3
|
bgneal@4
|
4 :date: 2013-09-21 11:45
|
bgneal@4
|
5 :tags: Django, Haystack
|
bgneal@4
|
6 :slug: haystack-search-quoting-issue
|
bgneal@4
|
7 :author: Brian Neal
|
bgneal@7
|
8 :summary: I use the awesome Haystack_ search framework for my Django_ powered website. I have found Haystack to be a huge win. It is easy to setup, configure, and customize when you have to. As someone who doesn't know very much about the world of searching, I'm grateful to have a powerful tool that just works without me having to get too involved in arcane details.
|
bgneal@4
|
9
|
bgneal@4
|
10 The case of the missing forum topic
|
bgneal@4
|
11 ===================================
|
bgneal@4
|
12
|
bgneal@4
|
13 I use the awesome Haystack_ search framework for my Django_ powered website.
|
bgneal@4
|
14 I have found Haystack to be a huge win. It is easy to setup, configure, and
|
bgneal@4
|
15 customize when you have to. As someone who doesn't know very much about the
|
bgneal@4
|
16 world of searching, I'm grateful to have a powerful tool that just works
|
bgneal@4
|
17 without me having to get too involved in arcane details.
|
bgneal@4
|
18
|
bgneal@4
|
19 One day one of our users noticed that he could not find a forum topic with the
|
bgneal@4
|
20 title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
|
bgneal@4
|
21 topic would turn up if you searched for *sounding* or *chords*. But no
|
bgneal@4
|
22 combination of *Hawaiian*, with or without quotes would uncover this topic.
|
bgneal@4
|
23
|
bgneal@4
|
24 I should mention I am using the `Xapian backend`_. I know the backend tries to
|
bgneal@4
|
25 remove puncuation and special characters to create uniform searches. But
|
bgneal@4
|
26 I could not figure out where this was getting dropped at. After a bit of
|
bgneal@4
|
27 searching online, I found a few hints which led to the solution.
|
bgneal@4
|
28
|
bgneal@4
|
29 Safety versus correctness
|
bgneal@4
|
30 =========================
|
bgneal@4
|
31
|
bgneal@4
|
32 As suggested in the documentation, I am using templates to build the document
|
bgneal@4
|
33 used for the search engine. My template for forum topics looked like this::
|
bgneal@4
|
34
|
bgneal@4
|
35 {{ object.name }}
|
bgneal@4
|
36 {{ object.user.username }}
|
bgneal@4
|
37 {{ object.user.get_full_name }}
|
bgneal@4
|
38
|
bgneal@4
|
39 A mailing list post from another user suggested the problem. Django by default
|
bgneal@4
|
40 escapes text in templates. Thus the forum topic title::
|
bgneal@4
|
41
|
bgneal@4
|
42 "Hawaiian" sounding chords
|
bgneal@4
|
43
|
bgneal@4
|
44 was being turned into this by the Django templating engine::
|
bgneal@4
|
45
|
bgneal@4
|
46 "Hawaiian" sounding chords
|
bgneal@4
|
47
|
bgneal@4
|
48 Now what Haystack and/or the Xapian backend were doing with
|
bgneal@4
|
49 ``"Hawaiian"`` I have no idea. I tried searching for this unusual
|
bgneal@4
|
50 term but it did not turn up any results. Apparently it is just getting dropped.
|
bgneal@4
|
51
|
bgneal@4
|
52 The solution was to modify the template to this::
|
bgneal@4
|
53
|
bgneal@4
|
54 {{ object.name|safe }}
|
bgneal@4
|
55 {{ object.user.username|safe }}
|
bgneal@4
|
56 {{ object.user.get_full_name|safe }}
|
bgneal@4
|
57
|
bgneal@4
|
58 But is it safe?
|
bgneal@4
|
59 ===============
|
bgneal@4
|
60
|
bgneal@4
|
61 After changing my template and rebuilding the index, the troublesome topic was
|
bgneal@4
|
62 then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
|
bgneal@4
|
63 supplied content now show up unescaped in the search results? Well I can't
|
bgneal@4
|
64 answer this authoritatively but I did spend a fair amount of time experimenting
|
bgneal@4
|
65 with this. I'm using Haystack's ``highlight`` template tag, and my users' input
|
bgneal@4
|
66 is done in Markdown_, and I could not inject malicious text into the search
|
bgneal@4
|
67 results. You should test this yourself on your site.
|
bgneal@4
|
68
|
bgneal@4
|
69 Conclusion
|
bgneal@4
|
70 ==========
|
bgneal@4
|
71
|
bgneal@4
|
72 This turned out to be a simple fix and I hope it helps someone else. I will
|
bgneal@4
|
73 make enquiries to see if this should be added to the Haystack documentation.
|
bgneal@4
|
74
|
bgneal@4
|
75 .. _Haystack: http://haystacksearch.org/
|
bgneal@4
|
76 .. _Django: https://www.djangoproject.com/
|
bgneal@4
|
77 .. _Xapian backend: https://github.com/notanumber/xapian-haystack
|
bgneal@4
|
78 .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
|
bgneal@4
|
79 .. _Markdown: http://daringfireball.net/projects/markdown/
|