Mercurial > public > pelican-blog
diff content/Coding/002-redis-whos-online.rst @ 4:7ce6393e6d30
Adding converted blog posts from old blog.
author | Brian Neal <bgneal@gmail.com> |
---|---|
date | Thu, 30 Jan 2014 21:45:03 -0600 |
parents | |
children | 49bebfa6f9d3 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/Coding/002-redis-whos-online.rst Thu Jan 30 21:45:03 2014 -0600 @@ -0,0 +1,269 @@ +A better "Who's Online" with Redis & Python +########################################### + +:date: 2011-04-25 12:00 +:tags: Redis, Python +:slug: a-better-who-s-online-with-redis-python +:author: Brian Neal + +**Updated on December 17, 2011:** I found a better solution. Head on over to +the `new post`_ to check it out. + + +Who's What? +----------- + +My website, like many others, has a "who's online" feature. It displays the +names of authenticated users that have been seen over the course of the last ten +minutes or so. It may seem a minor feature at first, but I find it really does a lot to +"humanize" the site and make it seem more like a community gathering place. + +My first implementation of this feature used the MySQL database to update a +per-user timestamp whenever a request from an authenticated user arrived. +Actually, this seemed excessive to me, so I used a strategy involving an "online" +cookie that has a five minute expiration time. Whenever I see an authenticated +user without the online cookie I update their timestamp and then hand them back +a cookie that will expire in five minutes. In this way I don't have to hit the +database on every single request. + +This approach worked fine but it has some aspects that didn't sit right with me: + +* It seems like overkill to use the database to store temporary, trivial information like + this. It doesn't feel like a good use of a full-featured relational database + management system (RDBMS). +* I am writing to the database during a GET request. Ideally, all GET requests should + be idempotent. Of course if this is strictly followed, it would be + impossible to create a "who's online" feature in the first place. You'd have + to require the user to POST data periodically. However, writing to a RDBMS + during a GET request is something I feel guilty about and try to avoid when I + can. + + +Redis +----- + +Enter Redis_. I discovered Redis recently, and it is pure, white-hot +awesomeness. What is Redis? It's one of those projects that gets slapped with +the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes +sense to me when described as a lightweight data structure server. +Memcached_ can store key-value pairs very fast, where the value is always a string. +Redis goes one step further and stores not only strings, but data +structures like lists, sets, and hashes. For a great overview of what Redis is +and what you can do with it, check out `Simon Willison's Redis tutorial`_. + +Another reason why I like Redis is that it is easy to install and deploy. +It is straight C code without any dependencies. Thus you can build it from +source just about anywhere. Your Linux distro may have a package for it, but it +is just as easy to grab the latest tarball and build it yourself. + +I've really come to appreciate Redis for being such a small and lightweight +tool. At the same time, it is very powerful and effective for filling those +tasks that a traditional RDBMS is not good at. + +For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py_ +client library. It can be installed with a simple + +.. sourcecode:: sh + + $ sudo pip install redis + + +Who's Online with Redis +----------------------- + +Now that we are going to use Redis, how do we implement a "who's online" +feature? The first step is to get familiar with the `Redis API`_. + +One approach to the "who's online" problem is to add a user name to a set +whenever we see a request from that user. That's fine but how do we know when +they have stopped browsing the site? We have to periodically clean out the +set in order to time people out. A cron job, for example, could delete the +set every five minutes. + +A small problem with deleting the set is that people will abruptly disappear +from the site every five minutes. In order to give more gradual behavior we +could utilize two sets, a "current" set and an "old" set. As users are seen, we +add their names to the current set. Every five minutes or so (season to taste), +we simply overwrite the old set with the contents of the current set, then clear +out the current set. At any given time, the set of who's online is the union +of these two sets. + +This approach doesn't give exact results of course, but it is perfectly fine for my site. + +Looking over the Redis API, we see that we'll be making use of the following +commands: + +* SADD_ for adding members to the current set. +* RENAME_ for copying the current set to the old, as well as destroying the + current set all in one step. +* SUNION_ for performing a union on the current and old sets to produce the set + of who's online. + +And that's it! With these three primitives we have everything we need. This is +because of the following useful Redis behaviors: + +* Performing a ``SADD`` against a set that doesn't exist creates the set and is + not an error. +* Performing a ``SUNION`` with sets that don't exist is fine; they are simply + treated as empty sets. + +The one caveat involves the ``RENAME`` command. If the key you wish to rename +does not exist, the Python Redis client treats this as an error and an exception +is thrown. + +Experimenting with algorithms and ideas is quite easy with Redis. You can either +use the Python Redis client in a Python interactive interpreter shell, or you can +use the command-line client that comes with Redis. Either way you can quickly +try out commands and refine your approach. + + +Implementation +-------------- + +My website is powered by Django_, but I am not going to show any Django specific +code here. Instead I'll show just the pure Python parts, and hopefully you can +adapt it to whatever framework, if any, you are using. + +I created a Python module to hold this functionality: +``whos_online.py``. Throughout this module I use a lot of exception handling, +mainly because if the Redis server has crashed (or if I forgot to start it, say +in development) I don't want my website to be unusable. If Redis is unavailable, +I simply log an error and drive on. Note that in my limited experience Redis is +very stable and has not crashed on me once, but it is good to be defensive. + +The first important function used throughout this module is a function to obtain +a connection to the Redis server: + +.. sourcecode:: python + + import logging + import redis + + logger = logging.getLogger(__name__) + + def _get_connection(): + """ + Create and return a Redis connection. Returns None on failure. + """ + try: + conn = redis.Redis(host=HOST, port=PORT, db=DB) + return conn + except redis.RedisError, e: + logger.error(e) + + return None + +The ``HOST``, ``PORT``, and ``DB`` constants can come from a +configuration file or they could be module-level constants. In my case they are set in my +Django ``settings.py`` file. Once we have this connection object, we are free to +use the Redis API exposed via the Python Redis client. + +To update the current set whenever we see a user, I call this function: + +.. sourcecode:: python + + # Redis key names: + USER_CURRENT_KEY = "wo_user_current" + USER_OLD_KEY = "wo_user_old" + + def report_user(username): + """ + Call this function when a user has been seen. The username will be added to + the current set. + """ + conn = _get_connection() + if conn: + try: + conn.sadd(USER_CURRENT_KEY, username) + except redis.RedisError, e: + logger.error(e) + +If you are using Django, a good spot to call this function is from a piece +of `custom middleware`_. I kept my "5 minute cookie" algorithm to avoid doing this on +every request although it is probably unnecessary on my low traffic site. + +Periodically you need to "age out" the sets by destroying the old set, moving +the current set to the old set, and then emptying the current set. + +.. sourcecode:: python + + def tick(): + """ + Call this function to "age out" the old set by renaming the current set + to the old. + """ + conn = _get_connection() + if conn: + # An exception may be raised if the current key doesn't exist; if that + # happens we have to delete the old set because no one is online. + try: + conn.rename(USER_CURRENT_KEY, USER_OLD_KEY) + except redis.ResponseError: + try: + del conn[old] + except redis.RedisError, e: + logger.error(e) + except redis.RedisError, e: + logger.error(e) + +As mentioned previously, if no one is on your site, eventually your current set +will cease to exist as it is renamed and not populated further. If you attempt to +rename a non-existent key, the Python Redis client raises a ``ResponseError`` exception. +If this occurs we just manually delete the old set. In a bit of Pythonic cleverness, +the Python Redis client supports the ``del`` syntax to support this operation. + +The ``tick()`` function can be called periodically by a cron job, for example. If you are using Django, +you could create a `custom management command`_ that calls ``tick()`` and schedule cron +to execute it. Alternatively, you could use something like Celery_ to schedule a +job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope +to explore in the near future). + +Finally, you need a way to obtain the current "who's online" set, which again is +a union of the current and old sets. + +.. sourcecode:: python + + def get_users_online(): + """ + Returns a set of user names which is the union of the current and old + sets. + """ + conn = _get_connection() + if conn: + try: + # Note that keys that do not exist are considered empty sets + return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY]) + except redis.RedisError, e: + logger.error(e) + + return set() + +In my Django application, I calling this function from a `custom inclusion template tag`_ +. + + +Conclusion +---------- + +I hope this blog post gives you some idea of the usefulness of Redis. I expanded +on this example to also keep track of non-authenticated "guest" users. I simply added +another pair of sets to track IP addresses. + +If you are like me, you are probably already thinking about shifting some functions that you +awkwardly jammed onto a traditional database to Redis and other "NoSQL" +technologies. + +.. _Redis: http://redis.io/ +.. _Memcached: http://memcached.org/ +.. _Simon Willison's Redis tutorial: http://simonwillison.net/static/2010/redis-tutorial/ +.. _redis-py: https://github.com/andymccurdy/redis-py +.. _Django: http://djangoproject.com +.. _Redis API: http://redis.io/commands +.. _SADD: http://redis.io/commands/sadd +.. _RENAME: http://redis.io/commands/rename +.. _SUNION: http://redis.io/commands/sunion +.. _custom middleware: http://docs.djangoproject.com/en/1.3/topics/http/middleware/ +.. _custom management command: http://docs.djangoproject.com/en/1.3/howto/custom-management-commands/ +.. _Celery: http://celeryproject.org/ +.. _custom inclusion template tag: http://docs.djangoproject.com/en/1.3/howto/custom-template-tags/#inclusion-tags +.. _new post: http://deathofagremmie.com/2011/12/17/who-s-online-with-redis-python-a-slight-return/