bgneal@4
|
1 Haystack Search Quoting Issue
|
bgneal@4
|
2 #############################
|
bgneal@4
|
3
|
bgneal@4
|
4 :date: 2013-09-21 11:45
|
bgneal@4
|
5 :tags: Django, Haystack
|
bgneal@4
|
6 :slug: haystack-search-quoting-issue
|
bgneal@4
|
7 :author: Brian Neal
|
bgneal@4
|
8
|
bgneal@4
|
9 The case of the missing forum topic
|
bgneal@4
|
10 ===================================
|
bgneal@4
|
11
|
bgneal@4
|
12 I use the awesome Haystack_ search framework for my Django_ powered website.
|
bgneal@4
|
13 I have found Haystack to be a huge win. It is easy to setup, configure, and
|
bgneal@4
|
14 customize when you have to. As someone who doesn't know very much about the
|
bgneal@4
|
15 world of searching, I'm grateful to have a powerful tool that just works
|
bgneal@4
|
16 without me having to get too involved in arcane details.
|
bgneal@4
|
17
|
bgneal@4
|
18 One day one of our users noticed that he could not find a forum topic with the
|
bgneal@4
|
19 title ``"Hawaiian" sounding chords``. Notice the word *Hawaiian* is in quotes. The
|
bgneal@4
|
20 topic would turn up if you searched for *sounding* or *chords*. But no
|
bgneal@4
|
21 combination of *Hawaiian*, with or without quotes would uncover this topic.
|
bgneal@4
|
22
|
bgneal@4
|
23 I should mention I am using the `Xapian backend`_. I know the backend tries to
|
bgneal@4
|
24 remove puncuation and special characters to create uniform searches. But
|
bgneal@4
|
25 I could not figure out where this was getting dropped at. After a bit of
|
bgneal@4
|
26 searching online, I found a few hints which led to the solution.
|
bgneal@4
|
27
|
bgneal@4
|
28 Safety versus correctness
|
bgneal@4
|
29 =========================
|
bgneal@4
|
30
|
bgneal@4
|
31 As suggested in the documentation, I am using templates to build the document
|
bgneal@4
|
32 used for the search engine. My template for forum topics looked like this::
|
bgneal@4
|
33
|
bgneal@4
|
34 {{ object.name }}
|
bgneal@4
|
35 {{ object.user.username }}
|
bgneal@4
|
36 {{ object.user.get_full_name }}
|
bgneal@4
|
37
|
bgneal@4
|
38 A mailing list post from another user suggested the problem. Django by default
|
bgneal@4
|
39 escapes text in templates. Thus the forum topic title::
|
bgneal@4
|
40
|
bgneal@4
|
41 "Hawaiian" sounding chords
|
bgneal@4
|
42
|
bgneal@4
|
43 was being turned into this by the Django templating engine::
|
bgneal@4
|
44
|
bgneal@4
|
45 "Hawaiian" sounding chords
|
bgneal@4
|
46
|
bgneal@4
|
47 Now what Haystack and/or the Xapian backend were doing with
|
bgneal@4
|
48 ``"Hawaiian"`` I have no idea. I tried searching for this unusual
|
bgneal@4
|
49 term but it did not turn up any results. Apparently it is just getting dropped.
|
bgneal@4
|
50
|
bgneal@4
|
51 The solution was to modify the template to this::
|
bgneal@4
|
52
|
bgneal@4
|
53 {{ object.name|safe }}
|
bgneal@4
|
54 {{ object.user.username|safe }}
|
bgneal@4
|
55 {{ object.user.get_full_name|safe }}
|
bgneal@4
|
56
|
bgneal@4
|
57 But is it safe?
|
bgneal@4
|
58 ===============
|
bgneal@4
|
59
|
bgneal@4
|
60 After changing my template and rebuilding the index, the troublesome topic was
|
bgneal@4
|
61 then found. Hooray! But have I just opened myself up to a XSS_ attack? Can user
|
bgneal@4
|
62 supplied content now show up unescaped in the search results? Well I can't
|
bgneal@4
|
63 answer this authoritatively but I did spend a fair amount of time experimenting
|
bgneal@4
|
64 with this. I'm using Haystack's ``highlight`` template tag, and my users' input
|
bgneal@4
|
65 is done in Markdown_, and I could not inject malicious text into the search
|
bgneal@4
|
66 results. You should test this yourself on your site.
|
bgneal@4
|
67
|
bgneal@4
|
68 Conclusion
|
bgneal@4
|
69 ==========
|
bgneal@4
|
70
|
bgneal@4
|
71 This turned out to be a simple fix and I hope it helps someone else. I will
|
bgneal@4
|
72 make enquiries to see if this should be added to the Haystack documentation.
|
bgneal@4
|
73
|
bgneal@4
|
74 .. _Haystack: http://haystacksearch.org/
|
bgneal@4
|
75 .. _Django: https://www.djangoproject.com/
|
bgneal@4
|
76 .. _Xapian backend: https://github.com/notanumber/xapian-haystack
|
bgneal@4
|
77 .. _XSS: http://en.wikipedia.org/wiki/Cross-site_scripting
|
bgneal@4
|
78 .. _Markdown: http://daringfireball.net/projects/markdown/
|