comparison content/Coding/002-redis-whos-online.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.
author Brian Neal <bgneal@gmail.com>
date Thu, 30 Jan 2014 21:45:03 -0600
parents
children 49bebfa6f9d3
comparison
equal deleted inserted replaced
3:c3115da3ff73 4:7ce6393e6d30
1 A better "Who's Online" with Redis & Python
2 ###########################################
3
4 :date: 2011-04-25 12:00
5 :tags: Redis, Python
6 :slug: a-better-who-s-online-with-redis-python
7 :author: Brian Neal
8
9 **Updated on December 17, 2011:** I found a better solution. Head on over to
10 the `new post`_ to check it out.
11
12
13 Who's What?
14 -----------
15
16 My website, like many others, has a "who's online" feature. It displays the
17 names of authenticated users that have been seen over the course of the last ten
18 minutes or so. It may seem a minor feature at first, but I find it really does a lot to
19 "humanize" the site and make it seem more like a community gathering place.
20
21 My first implementation of this feature used the MySQL database to update a
22 per-user timestamp whenever a request from an authenticated user arrived.
23 Actually, this seemed excessive to me, so I used a strategy involving an "online"
24 cookie that has a five minute expiration time. Whenever I see an authenticated
25 user without the online cookie I update their timestamp and then hand them back
26 a cookie that will expire in five minutes. In this way I don't have to hit the
27 database on every single request.
28
29 This approach worked fine but it has some aspects that didn't sit right with me:
30
31 * It seems like overkill to use the database to store temporary, trivial information like
32 this. It doesn't feel like a good use of a full-featured relational database
33 management system (RDBMS).
34 * I am writing to the database during a GET request. Ideally, all GET requests should
35 be idempotent. Of course if this is strictly followed, it would be
36 impossible to create a "who's online" feature in the first place. You'd have
37 to require the user to POST data periodically. However, writing to a RDBMS
38 during a GET request is something I feel guilty about and try to avoid when I
39 can.
40
41
42 Redis
43 -----
44
45 Enter Redis_. I discovered Redis recently, and it is pure, white-hot
46 awesomeness. What is Redis? It's one of those projects that gets slapped with
47 the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes
48 sense to me when described as a lightweight data structure server.
49 Memcached_ can store key-value pairs very fast, where the value is always a string.
50 Redis goes one step further and stores not only strings, but data
51 structures like lists, sets, and hashes. For a great overview of what Redis is
52 and what you can do with it, check out `Simon Willison's Redis tutorial`_.
53
54 Another reason why I like Redis is that it is easy to install and deploy.
55 It is straight C code without any dependencies. Thus you can build it from
56 source just about anywhere. Your Linux distro may have a package for it, but it
57 is just as easy to grab the latest tarball and build it yourself.
58
59 I've really come to appreciate Redis for being such a small and lightweight
60 tool. At the same time, it is very powerful and effective for filling those
61 tasks that a traditional RDBMS is not good at.
62
63 For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py_
64 client library. It can be installed with a simple
65
66 .. sourcecode:: sh
67
68 $ sudo pip install redis
69
70
71 Who's Online with Redis
72 -----------------------
73
74 Now that we are going to use Redis, how do we implement a "who's online"
75 feature? The first step is to get familiar with the `Redis API`_.
76
77 One approach to the "who's online" problem is to add a user name to a set
78 whenever we see a request from that user. That's fine but how do we know when
79 they have stopped browsing the site? We have to periodically clean out the
80 set in order to time people out. A cron job, for example, could delete the
81 set every five minutes.
82
83 A small problem with deleting the set is that people will abruptly disappear
84 from the site every five minutes. In order to give more gradual behavior we
85 could utilize two sets, a "current" set and an "old" set. As users are seen, we
86 add their names to the current set. Every five minutes or so (season to taste),
87 we simply overwrite the old set with the contents of the current set, then clear
88 out the current set. At any given time, the set of who's online is the union
89 of these two sets.
90
91 This approach doesn't give exact results of course, but it is perfectly fine for my site.
92
93 Looking over the Redis API, we see that we'll be making use of the following
94 commands:
95
96 * SADD_ for adding members to the current set.
97 * RENAME_ for copying the current set to the old, as well as destroying the
98 current set all in one step.
99 * SUNION_ for performing a union on the current and old sets to produce the set
100 of who's online.
101
102 And that's it! With these three primitives we have everything we need. This is
103 because of the following useful Redis behaviors:
104
105 * Performing a ``SADD`` against a set that doesn't exist creates the set and is
106 not an error.
107 * Performing a ``SUNION`` with sets that don't exist is fine; they are simply
108 treated as empty sets.
109
110 The one caveat involves the ``RENAME`` command. If the key you wish to rename
111 does not exist, the Python Redis client treats this as an error and an exception
112 is thrown.
113
114 Experimenting with algorithms and ideas is quite easy with Redis. You can either
115 use the Python Redis client in a Python interactive interpreter shell, or you can
116 use the command-line client that comes with Redis. Either way you can quickly
117 try out commands and refine your approach.
118
119
120 Implementation
121 --------------
122
123 My website is powered by Django_, but I am not going to show any Django specific
124 code here. Instead I'll show just the pure Python parts, and hopefully you can
125 adapt it to whatever framework, if any, you are using.
126
127 I created a Python module to hold this functionality:
128 ``whos_online.py``. Throughout this module I use a lot of exception handling,
129 mainly because if the Redis server has crashed (or if I forgot to start it, say
130 in development) I don't want my website to be unusable. If Redis is unavailable,
131 I simply log an error and drive on. Note that in my limited experience Redis is
132 very stable and has not crashed on me once, but it is good to be defensive.
133
134 The first important function used throughout this module is a function to obtain
135 a connection to the Redis server:
136
137 .. sourcecode:: python
138
139 import logging
140 import redis
141
142 logger = logging.getLogger(__name__)
143
144 def _get_connection():
145 """
146 Create and return a Redis connection. Returns None on failure.
147 """
148 try:
149 conn = redis.Redis(host=HOST, port=PORT, db=DB)
150 return conn
151 except redis.RedisError, e:
152 logger.error(e)
153
154 return None
155
156 The ``HOST``, ``PORT``, and ``DB`` constants can come from a
157 configuration file or they could be module-level constants. In my case they are set in my
158 Django ``settings.py`` file. Once we have this connection object, we are free to
159 use the Redis API exposed via the Python Redis client.
160
161 To update the current set whenever we see a user, I call this function:
162
163 .. sourcecode:: python
164
165 # Redis key names:
166 USER_CURRENT_KEY = "wo_user_current"
167 USER_OLD_KEY = "wo_user_old"
168
169 def report_user(username):
170 """
171 Call this function when a user has been seen. The username will be added to
172 the current set.
173 """
174 conn = _get_connection()
175 if conn:
176 try:
177 conn.sadd(USER_CURRENT_KEY, username)
178 except redis.RedisError, e:
179 logger.error(e)
180
181 If you are using Django, a good spot to call this function is from a piece
182 of `custom middleware`_. I kept my "5 minute cookie" algorithm to avoid doing this on
183 every request although it is probably unnecessary on my low traffic site.
184
185 Periodically you need to "age out" the sets by destroying the old set, moving
186 the current set to the old set, and then emptying the current set.
187
188 .. sourcecode:: python
189
190 def tick():
191 """
192 Call this function to "age out" the old set by renaming the current set
193 to the old.
194 """
195 conn = _get_connection()
196 if conn:
197 # An exception may be raised if the current key doesn't exist; if that
198 # happens we have to delete the old set because no one is online.
199 try:
200 conn.rename(USER_CURRENT_KEY, USER_OLD_KEY)
201 except redis.ResponseError:
202 try:
203 del conn[old]
204 except redis.RedisError, e:
205 logger.error(e)
206 except redis.RedisError, e:
207 logger.error(e)
208
209 As mentioned previously, if no one is on your site, eventually your current set
210 will cease to exist as it is renamed and not populated further. If you attempt to
211 rename a non-existent key, the Python Redis client raises a ``ResponseError`` exception.
212 If this occurs we just manually delete the old set. In a bit of Pythonic cleverness,
213 the Python Redis client supports the ``del`` syntax to support this operation.
214
215 The ``tick()`` function can be called periodically by a cron job, for example. If you are using Django,
216 you could create a `custom management command`_ that calls ``tick()`` and schedule cron
217 to execute it. Alternatively, you could use something like Celery_ to schedule a
218 job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope
219 to explore in the near future).
220
221 Finally, you need a way to obtain the current "who's online" set, which again is
222 a union of the current and old sets.
223
224 .. sourcecode:: python
225
226 def get_users_online():
227 """
228 Returns a set of user names which is the union of the current and old
229 sets.
230 """
231 conn = _get_connection()
232 if conn:
233 try:
234 # Note that keys that do not exist are considered empty sets
235 return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY])
236 except redis.RedisError, e:
237 logger.error(e)
238
239 return set()
240
241 In my Django application, I calling this function from a `custom inclusion template tag`_
242 .
243
244
245 Conclusion
246 ----------
247
248 I hope this blog post gives you some idea of the usefulness of Redis. I expanded
249 on this example to also keep track of non-authenticated "guest" users. I simply added
250 another pair of sets to track IP addresses.
251
252 If you are like me, you are probably already thinking about shifting some functions that you
253 awkwardly jammed onto a traditional database to Redis and other "NoSQL"
254 technologies.
255
256 .. _Redis: http://redis.io/
257 .. _Memcached: http://memcached.org/
258 .. _Simon Willison's Redis tutorial: http://simonwillison.net/static/2010/redis-tutorial/
259 .. _redis-py: https://github.com/andymccurdy/redis-py
260 .. _Django: http://djangoproject.com
261 .. _Redis API: http://redis.io/commands
262 .. _SADD: http://redis.io/commands/sadd
263 .. _RENAME: http://redis.io/commands/rename
264 .. _SUNION: http://redis.io/commands/sunion
265 .. _custom middleware: http://docs.djangoproject.com/en/1.3/topics/http/middleware/
266 .. _custom management command: http://docs.djangoproject.com/en/1.3/howto/custom-management-commands/
267 .. _Celery: http://celeryproject.org/
268 .. _custom inclusion template tag: http://docs.djangoproject.com/en/1.3/howto/custom-template-tags/#inclusion-tags
269 .. _new post: http://deathofagremmie.com/2011/12/17/who-s-online-with-redis-python-a-slight-return/