annotate content/Coding/002-redis-whos-online.rst @ 10:6c03ca07a16d

Renamed my tools directory to "tools". I named it __bgn because I was worried it would clash with a future Pelican updaet. But it seems like this would only happen if I re-ran the quickstart script. "tools" is a better name. :)
author Brian Neal <bgneal@gmail.com>
date Sun, 02 Feb 2014 11:32:13 -0600
parents 49bebfa6f9d3
children
rev   line source
bgneal@4 1 A better "Who's Online" with Redis & Python
bgneal@4 2 ###########################################
bgneal@4 3
bgneal@4 4 :date: 2011-04-25 12:00
bgneal@4 5 :tags: Redis, Python
bgneal@4 6 :slug: a-better-who-s-online-with-redis-python
bgneal@4 7 :author: Brian Neal
bgneal@7 8 :summary: Still trying to find a better "who's online" function. I ran with this method for a while, but later found a way to improve upon it.
bgneal@4 9
bgneal@4 10 **Updated on December 17, 2011:** I found a better solution. Head on over to
bgneal@4 11 the `new post`_ to check it out.
bgneal@4 12
bgneal@4 13
bgneal@4 14 Who's What?
bgneal@4 15 -----------
bgneal@4 16
bgneal@4 17 My website, like many others, has a "who's online" feature. It displays the
bgneal@4 18 names of authenticated users that have been seen over the course of the last ten
bgneal@4 19 minutes or so. It may seem a minor feature at first, but I find it really does a lot to
bgneal@4 20 "humanize" the site and make it seem more like a community gathering place.
bgneal@4 21
bgneal@4 22 My first implementation of this feature used the MySQL database to update a
bgneal@4 23 per-user timestamp whenever a request from an authenticated user arrived.
bgneal@4 24 Actually, this seemed excessive to me, so I used a strategy involving an "online"
bgneal@4 25 cookie that has a five minute expiration time. Whenever I see an authenticated
bgneal@4 26 user without the online cookie I update their timestamp and then hand them back
bgneal@4 27 a cookie that will expire in five minutes. In this way I don't have to hit the
bgneal@4 28 database on every single request.
bgneal@4 29
bgneal@4 30 This approach worked fine but it has some aspects that didn't sit right with me:
bgneal@4 31
bgneal@4 32 * It seems like overkill to use the database to store temporary, trivial information like
bgneal@4 33 this. It doesn't feel like a good use of a full-featured relational database
bgneal@4 34 management system (RDBMS).
bgneal@4 35 * I am writing to the database during a GET request. Ideally, all GET requests should
bgneal@4 36 be idempotent. Of course if this is strictly followed, it would be
bgneal@4 37 impossible to create a "who's online" feature in the first place. You'd have
bgneal@4 38 to require the user to POST data periodically. However, writing to a RDBMS
bgneal@4 39 during a GET request is something I feel guilty about and try to avoid when I
bgneal@4 40 can.
bgneal@4 41
bgneal@4 42
bgneal@4 43 Redis
bgneal@4 44 -----
bgneal@4 45
bgneal@4 46 Enter Redis_. I discovered Redis recently, and it is pure, white-hot
bgneal@4 47 awesomeness. What is Redis? It's one of those projects that gets slapped with
bgneal@4 48 the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes
bgneal@4 49 sense to me when described as a lightweight data structure server.
bgneal@4 50 Memcached_ can store key-value pairs very fast, where the value is always a string.
bgneal@4 51 Redis goes one step further and stores not only strings, but data
bgneal@4 52 structures like lists, sets, and hashes. For a great overview of what Redis is
bgneal@4 53 and what you can do with it, check out `Simon Willison's Redis tutorial`_.
bgneal@4 54
bgneal@4 55 Another reason why I like Redis is that it is easy to install and deploy.
bgneal@4 56 It is straight C code without any dependencies. Thus you can build it from
bgneal@4 57 source just about anywhere. Your Linux distro may have a package for it, but it
bgneal@4 58 is just as easy to grab the latest tarball and build it yourself.
bgneal@4 59
bgneal@4 60 I've really come to appreciate Redis for being such a small and lightweight
bgneal@4 61 tool. At the same time, it is very powerful and effective for filling those
bgneal@4 62 tasks that a traditional RDBMS is not good at.
bgneal@4 63
bgneal@4 64 For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py_
bgneal@4 65 client library. It can be installed with a simple
bgneal@4 66
bgneal@4 67 .. sourcecode:: sh
bgneal@4 68
bgneal@4 69 $ sudo pip install redis
bgneal@4 70
bgneal@4 71
bgneal@4 72 Who's Online with Redis
bgneal@4 73 -----------------------
bgneal@4 74
bgneal@4 75 Now that we are going to use Redis, how do we implement a "who's online"
bgneal@4 76 feature? The first step is to get familiar with the `Redis API`_.
bgneal@4 77
bgneal@4 78 One approach to the "who's online" problem is to add a user name to a set
bgneal@4 79 whenever we see a request from that user. That's fine but how do we know when
bgneal@4 80 they have stopped browsing the site? We have to periodically clean out the
bgneal@4 81 set in order to time people out. A cron job, for example, could delete the
bgneal@4 82 set every five minutes.
bgneal@4 83
bgneal@4 84 A small problem with deleting the set is that people will abruptly disappear
bgneal@4 85 from the site every five minutes. In order to give more gradual behavior we
bgneal@4 86 could utilize two sets, a "current" set and an "old" set. As users are seen, we
bgneal@4 87 add their names to the current set. Every five minutes or so (season to taste),
bgneal@4 88 we simply overwrite the old set with the contents of the current set, then clear
bgneal@4 89 out the current set. At any given time, the set of who's online is the union
bgneal@4 90 of these two sets.
bgneal@4 91
bgneal@4 92 This approach doesn't give exact results of course, but it is perfectly fine for my site.
bgneal@4 93
bgneal@4 94 Looking over the Redis API, we see that we'll be making use of the following
bgneal@4 95 commands:
bgneal@4 96
bgneal@4 97 * SADD_ for adding members to the current set.
bgneal@4 98 * RENAME_ for copying the current set to the old, as well as destroying the
bgneal@4 99 current set all in one step.
bgneal@4 100 * SUNION_ for performing a union on the current and old sets to produce the set
bgneal@4 101 of who's online.
bgneal@4 102
bgneal@4 103 And that's it! With these three primitives we have everything we need. This is
bgneal@4 104 because of the following useful Redis behaviors:
bgneal@4 105
bgneal@4 106 * Performing a ``SADD`` against a set that doesn't exist creates the set and is
bgneal@4 107 not an error.
bgneal@4 108 * Performing a ``SUNION`` with sets that don't exist is fine; they are simply
bgneal@4 109 treated as empty sets.
bgneal@4 110
bgneal@4 111 The one caveat involves the ``RENAME`` command. If the key you wish to rename
bgneal@4 112 does not exist, the Python Redis client treats this as an error and an exception
bgneal@4 113 is thrown.
bgneal@4 114
bgneal@4 115 Experimenting with algorithms and ideas is quite easy with Redis. You can either
bgneal@4 116 use the Python Redis client in a Python interactive interpreter shell, or you can
bgneal@4 117 use the command-line client that comes with Redis. Either way you can quickly
bgneal@4 118 try out commands and refine your approach.
bgneal@4 119
bgneal@4 120
bgneal@4 121 Implementation
bgneal@4 122 --------------
bgneal@4 123
bgneal@4 124 My website is powered by Django_, but I am not going to show any Django specific
bgneal@4 125 code here. Instead I'll show just the pure Python parts, and hopefully you can
bgneal@4 126 adapt it to whatever framework, if any, you are using.
bgneal@4 127
bgneal@4 128 I created a Python module to hold this functionality:
bgneal@4 129 ``whos_online.py``. Throughout this module I use a lot of exception handling,
bgneal@4 130 mainly because if the Redis server has crashed (or if I forgot to start it, say
bgneal@4 131 in development) I don't want my website to be unusable. If Redis is unavailable,
bgneal@4 132 I simply log an error and drive on. Note that in my limited experience Redis is
bgneal@4 133 very stable and has not crashed on me once, but it is good to be defensive.
bgneal@4 134
bgneal@4 135 The first important function used throughout this module is a function to obtain
bgneal@4 136 a connection to the Redis server:
bgneal@4 137
bgneal@4 138 .. sourcecode:: python
bgneal@4 139
bgneal@4 140 import logging
bgneal@4 141 import redis
bgneal@4 142
bgneal@4 143 logger = logging.getLogger(__name__)
bgneal@4 144
bgneal@4 145 def _get_connection():
bgneal@4 146 """
bgneal@4 147 Create and return a Redis connection. Returns None on failure.
bgneal@4 148 """
bgneal@4 149 try:
bgneal@4 150 conn = redis.Redis(host=HOST, port=PORT, db=DB)
bgneal@4 151 return conn
bgneal@4 152 except redis.RedisError, e:
bgneal@4 153 logger.error(e)
bgneal@4 154
bgneal@4 155 return None
bgneal@4 156
bgneal@4 157 The ``HOST``, ``PORT``, and ``DB`` constants can come from a
bgneal@4 158 configuration file or they could be module-level constants. In my case they are set in my
bgneal@4 159 Django ``settings.py`` file. Once we have this connection object, we are free to
bgneal@4 160 use the Redis API exposed via the Python Redis client.
bgneal@4 161
bgneal@4 162 To update the current set whenever we see a user, I call this function:
bgneal@4 163
bgneal@4 164 .. sourcecode:: python
bgneal@4 165
bgneal@4 166 # Redis key names:
bgneal@4 167 USER_CURRENT_KEY = "wo_user_current"
bgneal@4 168 USER_OLD_KEY = "wo_user_old"
bgneal@4 169
bgneal@4 170 def report_user(username):
bgneal@4 171 """
bgneal@4 172 Call this function when a user has been seen. The username will be added to
bgneal@4 173 the current set.
bgneal@4 174 """
bgneal@4 175 conn = _get_connection()
bgneal@4 176 if conn:
bgneal@4 177 try:
bgneal@4 178 conn.sadd(USER_CURRENT_KEY, username)
bgneal@4 179 except redis.RedisError, e:
bgneal@4 180 logger.error(e)
bgneal@4 181
bgneal@4 182 If you are using Django, a good spot to call this function is from a piece
bgneal@4 183 of `custom middleware`_. I kept my "5 minute cookie" algorithm to avoid doing this on
bgneal@4 184 every request although it is probably unnecessary on my low traffic site.
bgneal@4 185
bgneal@4 186 Periodically you need to "age out" the sets by destroying the old set, moving
bgneal@4 187 the current set to the old set, and then emptying the current set.
bgneal@4 188
bgneal@4 189 .. sourcecode:: python
bgneal@4 190
bgneal@4 191 def tick():
bgneal@4 192 """
bgneal@4 193 Call this function to "age out" the old set by renaming the current set
bgneal@4 194 to the old.
bgneal@4 195 """
bgneal@4 196 conn = _get_connection()
bgneal@4 197 if conn:
bgneal@4 198 # An exception may be raised if the current key doesn't exist; if that
bgneal@4 199 # happens we have to delete the old set because no one is online.
bgneal@4 200 try:
bgneal@4 201 conn.rename(USER_CURRENT_KEY, USER_OLD_KEY)
bgneal@4 202 except redis.ResponseError:
bgneal@4 203 try:
bgneal@4 204 del conn[old]
bgneal@4 205 except redis.RedisError, e:
bgneal@4 206 logger.error(e)
bgneal@4 207 except redis.RedisError, e:
bgneal@4 208 logger.error(e)
bgneal@4 209
bgneal@4 210 As mentioned previously, if no one is on your site, eventually your current set
bgneal@4 211 will cease to exist as it is renamed and not populated further. If you attempt to
bgneal@4 212 rename a non-existent key, the Python Redis client raises a ``ResponseError`` exception.
bgneal@4 213 If this occurs we just manually delete the old set. In a bit of Pythonic cleverness,
bgneal@4 214 the Python Redis client supports the ``del`` syntax to support this operation.
bgneal@4 215
bgneal@4 216 The ``tick()`` function can be called periodically by a cron job, for example. If you are using Django,
bgneal@4 217 you could create a `custom management command`_ that calls ``tick()`` and schedule cron
bgneal@4 218 to execute it. Alternatively, you could use something like Celery_ to schedule a
bgneal@4 219 job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope
bgneal@4 220 to explore in the near future).
bgneal@4 221
bgneal@4 222 Finally, you need a way to obtain the current "who's online" set, which again is
bgneal@4 223 a union of the current and old sets.
bgneal@4 224
bgneal@4 225 .. sourcecode:: python
bgneal@4 226
bgneal@4 227 def get_users_online():
bgneal@4 228 """
bgneal@4 229 Returns a set of user names which is the union of the current and old
bgneal@4 230 sets.
bgneal@4 231 """
bgneal@4 232 conn = _get_connection()
bgneal@4 233 if conn:
bgneal@4 234 try:
bgneal@4 235 # Note that keys that do not exist are considered empty sets
bgneal@4 236 return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY])
bgneal@4 237 except redis.RedisError, e:
bgneal@4 238 logger.error(e)
bgneal@4 239
bgneal@4 240 return set()
bgneal@4 241
bgneal@4 242 In my Django application, I calling this function from a `custom inclusion template tag`_
bgneal@4 243 .
bgneal@4 244
bgneal@4 245
bgneal@4 246 Conclusion
bgneal@4 247 ----------
bgneal@4 248
bgneal@4 249 I hope this blog post gives you some idea of the usefulness of Redis. I expanded
bgneal@4 250 on this example to also keep track of non-authenticated "guest" users. I simply added
bgneal@4 251 another pair of sets to track IP addresses.
bgneal@4 252
bgneal@4 253 If you are like me, you are probably already thinking about shifting some functions that you
bgneal@4 254 awkwardly jammed onto a traditional database to Redis and other "NoSQL"
bgneal@4 255 technologies.
bgneal@4 256
bgneal@4 257 .. _Redis: http://redis.io/
bgneal@4 258 .. _Memcached: http://memcached.org/
bgneal@4 259 .. _Simon Willison's Redis tutorial: http://simonwillison.net/static/2010/redis-tutorial/
bgneal@4 260 .. _redis-py: https://github.com/andymccurdy/redis-py
bgneal@4 261 .. _Django: http://djangoproject.com
bgneal@4 262 .. _Redis API: http://redis.io/commands
bgneal@4 263 .. _SADD: http://redis.io/commands/sadd
bgneal@4 264 .. _RENAME: http://redis.io/commands/rename
bgneal@4 265 .. _SUNION: http://redis.io/commands/sunion
bgneal@4 266 .. _custom middleware: http://docs.djangoproject.com/en/1.3/topics/http/middleware/
bgneal@4 267 .. _custom management command: http://docs.djangoproject.com/en/1.3/howto/custom-management-commands/
bgneal@4 268 .. _Celery: http://celeryproject.org/
bgneal@4 269 .. _custom inclusion template tag: http://docs.djangoproject.com/en/1.3/howto/custom-template-tags/#inclusion-tags
bgneal@4 270 .. _new post: http://deathofagremmie.com/2011/12/17/who-s-online-with-redis-python-a-slight-return/