bgneal@4: A better "Who's Online" with Redis & Python bgneal@4: ########################################### bgneal@4: bgneal@4: :date: 2011-04-25 12:00 bgneal@4: :tags: Redis, Python bgneal@4: :slug: a-better-who-s-online-with-redis-python bgneal@4: :author: Brian Neal bgneal@4: bgneal@4: **Updated on December 17, 2011:** I found a better solution. Head on over to bgneal@4: the `new post`_ to check it out. bgneal@4: bgneal@4: bgneal@4: Who's What? bgneal@4: ----------- bgneal@4: bgneal@4: My website, like many others, has a "who's online" feature. It displays the bgneal@4: names of authenticated users that have been seen over the course of the last ten bgneal@4: minutes or so. It may seem a minor feature at first, but I find it really does a lot to bgneal@4: "humanize" the site and make it seem more like a community gathering place. bgneal@4: bgneal@4: My first implementation of this feature used the MySQL database to update a bgneal@4: per-user timestamp whenever a request from an authenticated user arrived. bgneal@4: Actually, this seemed excessive to me, so I used a strategy involving an "online" bgneal@4: cookie that has a five minute expiration time. Whenever I see an authenticated bgneal@4: user without the online cookie I update their timestamp and then hand them back bgneal@4: a cookie that will expire in five minutes. In this way I don't have to hit the bgneal@4: database on every single request. bgneal@4: bgneal@4: This approach worked fine but it has some aspects that didn't sit right with me: bgneal@4: bgneal@4: * It seems like overkill to use the database to store temporary, trivial information like bgneal@4: this. It doesn't feel like a good use of a full-featured relational database bgneal@4: management system (RDBMS). bgneal@4: * I am writing to the database during a GET request. Ideally, all GET requests should bgneal@4: be idempotent. Of course if this is strictly followed, it would be bgneal@4: impossible to create a "who's online" feature in the first place. You'd have bgneal@4: to require the user to POST data periodically. However, writing to a RDBMS bgneal@4: during a GET request is something I feel guilty about and try to avoid when I bgneal@4: can. bgneal@4: bgneal@4: bgneal@4: Redis bgneal@4: ----- bgneal@4: bgneal@4: Enter Redis_. I discovered Redis recently, and it is pure, white-hot bgneal@4: awesomeness. What is Redis? It's one of those projects that gets slapped with bgneal@4: the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes bgneal@4: sense to me when described as a lightweight data structure server. bgneal@4: Memcached_ can store key-value pairs very fast, where the value is always a string. bgneal@4: Redis goes one step further and stores not only strings, but data bgneal@4: structures like lists, sets, and hashes. For a great overview of what Redis is bgneal@4: and what you can do with it, check out `Simon Willison's Redis tutorial`_. bgneal@4: bgneal@4: Another reason why I like Redis is that it is easy to install and deploy. bgneal@4: It is straight C code without any dependencies. Thus you can build it from bgneal@4: source just about anywhere. Your Linux distro may have a package for it, but it bgneal@4: is just as easy to grab the latest tarball and build it yourself. bgneal@4: bgneal@4: I've really come to appreciate Redis for being such a small and lightweight bgneal@4: tool. At the same time, it is very powerful and effective for filling those bgneal@4: tasks that a traditional RDBMS is not good at. bgneal@4: bgneal@4: For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py_ bgneal@4: client library. It can be installed with a simple bgneal@4: bgneal@4: .. sourcecode:: sh bgneal@4: bgneal@4: $ sudo pip install redis bgneal@4: bgneal@4: bgneal@4: Who's Online with Redis bgneal@4: ----------------------- bgneal@4: bgneal@4: Now that we are going to use Redis, how do we implement a "who's online" bgneal@4: feature? The first step is to get familiar with the `Redis API`_. bgneal@4: bgneal@4: One approach to the "who's online" problem is to add a user name to a set bgneal@4: whenever we see a request from that user. That's fine but how do we know when bgneal@4: they have stopped browsing the site? We have to periodically clean out the bgneal@4: set in order to time people out. A cron job, for example, could delete the bgneal@4: set every five minutes. bgneal@4: bgneal@4: A small problem with deleting the set is that people will abruptly disappear bgneal@4: from the site every five minutes. In order to give more gradual behavior we bgneal@4: could utilize two sets, a "current" set and an "old" set. As users are seen, we bgneal@4: add their names to the current set. Every five minutes or so (season to taste), bgneal@4: we simply overwrite the old set with the contents of the current set, then clear bgneal@4: out the current set. At any given time, the set of who's online is the union bgneal@4: of these two sets. bgneal@4: bgneal@4: This approach doesn't give exact results of course, but it is perfectly fine for my site. bgneal@4: bgneal@4: Looking over the Redis API, we see that we'll be making use of the following bgneal@4: commands: bgneal@4: bgneal@4: * SADD_ for adding members to the current set. bgneal@4: * RENAME_ for copying the current set to the old, as well as destroying the bgneal@4: current set all in one step. bgneal@4: * SUNION_ for performing a union on the current and old sets to produce the set bgneal@4: of who's online. bgneal@4: bgneal@4: And that's it! With these three primitives we have everything we need. This is bgneal@4: because of the following useful Redis behaviors: bgneal@4: bgneal@4: * Performing a ``SADD`` against a set that doesn't exist creates the set and is bgneal@4: not an error. bgneal@4: * Performing a ``SUNION`` with sets that don't exist is fine; they are simply bgneal@4: treated as empty sets. bgneal@4: bgneal@4: The one caveat involves the ``RENAME`` command. If the key you wish to rename bgneal@4: does not exist, the Python Redis client treats this as an error and an exception bgneal@4: is thrown. bgneal@4: bgneal@4: Experimenting with algorithms and ideas is quite easy with Redis. You can either bgneal@4: use the Python Redis client in a Python interactive interpreter shell, or you can bgneal@4: use the command-line client that comes with Redis. Either way you can quickly bgneal@4: try out commands and refine your approach. bgneal@4: bgneal@4: bgneal@4: Implementation bgneal@4: -------------- bgneal@4: bgneal@4: My website is powered by Django_, but I am not going to show any Django specific bgneal@4: code here. Instead I'll show just the pure Python parts, and hopefully you can bgneal@4: adapt it to whatever framework, if any, you are using. bgneal@4: bgneal@4: I created a Python module to hold this functionality: bgneal@4: ``whos_online.py``. Throughout this module I use a lot of exception handling, bgneal@4: mainly because if the Redis server has crashed (or if I forgot to start it, say bgneal@4: in development) I don't want my website to be unusable. If Redis is unavailable, bgneal@4: I simply log an error and drive on. Note that in my limited experience Redis is bgneal@4: very stable and has not crashed on me once, but it is good to be defensive. bgneal@4: bgneal@4: The first important function used throughout this module is a function to obtain bgneal@4: a connection to the Redis server: bgneal@4: bgneal@4: .. sourcecode:: python bgneal@4: bgneal@4: import logging bgneal@4: import redis bgneal@4: bgneal@4: logger = logging.getLogger(__name__) bgneal@4: bgneal@4: def _get_connection(): bgneal@4: """ bgneal@4: Create and return a Redis connection. Returns None on failure. bgneal@4: """ bgneal@4: try: bgneal@4: conn = redis.Redis(host=HOST, port=PORT, db=DB) bgneal@4: return conn bgneal@4: except redis.RedisError, e: bgneal@4: logger.error(e) bgneal@4: bgneal@4: return None bgneal@4: bgneal@4: The ``HOST``, ``PORT``, and ``DB`` constants can come from a bgneal@4: configuration file or they could be module-level constants. In my case they are set in my bgneal@4: Django ``settings.py`` file. Once we have this connection object, we are free to bgneal@4: use the Redis API exposed via the Python Redis client. bgneal@4: bgneal@4: To update the current set whenever we see a user, I call this function: bgneal@4: bgneal@4: .. sourcecode:: python bgneal@4: bgneal@4: # Redis key names: bgneal@4: USER_CURRENT_KEY = "wo_user_current" bgneal@4: USER_OLD_KEY = "wo_user_old" bgneal@4: bgneal@4: def report_user(username): bgneal@4: """ bgneal@4: Call this function when a user has been seen. The username will be added to bgneal@4: the current set. bgneal@4: """ bgneal@4: conn = _get_connection() bgneal@4: if conn: bgneal@4: try: bgneal@4: conn.sadd(USER_CURRENT_KEY, username) bgneal@4: except redis.RedisError, e: bgneal@4: logger.error(e) bgneal@4: bgneal@4: If you are using Django, a good spot to call this function is from a piece bgneal@4: of `custom middleware`_. I kept my "5 minute cookie" algorithm to avoid doing this on bgneal@4: every request although it is probably unnecessary on my low traffic site. bgneal@4: bgneal@4: Periodically you need to "age out" the sets by destroying the old set, moving bgneal@4: the current set to the old set, and then emptying the current set. bgneal@4: bgneal@4: .. sourcecode:: python bgneal@4: bgneal@4: def tick(): bgneal@4: """ bgneal@4: Call this function to "age out" the old set by renaming the current set bgneal@4: to the old. bgneal@4: """ bgneal@4: conn = _get_connection() bgneal@4: if conn: bgneal@4: # An exception may be raised if the current key doesn't exist; if that bgneal@4: # happens we have to delete the old set because no one is online. bgneal@4: try: bgneal@4: conn.rename(USER_CURRENT_KEY, USER_OLD_KEY) bgneal@4: except redis.ResponseError: bgneal@4: try: bgneal@4: del conn[old] bgneal@4: except redis.RedisError, e: bgneal@4: logger.error(e) bgneal@4: except redis.RedisError, e: bgneal@4: logger.error(e) bgneal@4: bgneal@4: As mentioned previously, if no one is on your site, eventually your current set bgneal@4: will cease to exist as it is renamed and not populated further. If you attempt to bgneal@4: rename a non-existent key, the Python Redis client raises a ``ResponseError`` exception. bgneal@4: If this occurs we just manually delete the old set. In a bit of Pythonic cleverness, bgneal@4: the Python Redis client supports the ``del`` syntax to support this operation. bgneal@4: bgneal@4: The ``tick()`` function can be called periodically by a cron job, for example. If you are using Django, bgneal@4: you could create a `custom management command`_ that calls ``tick()`` and schedule cron bgneal@4: to execute it. Alternatively, you could use something like Celery_ to schedule a bgneal@4: job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope bgneal@4: to explore in the near future). bgneal@4: bgneal@4: Finally, you need a way to obtain the current "who's online" set, which again is bgneal@4: a union of the current and old sets. bgneal@4: bgneal@4: .. sourcecode:: python bgneal@4: bgneal@4: def get_users_online(): bgneal@4: """ bgneal@4: Returns a set of user names which is the union of the current and old bgneal@4: sets. bgneal@4: """ bgneal@4: conn = _get_connection() bgneal@4: if conn: bgneal@4: try: bgneal@4: # Note that keys that do not exist are considered empty sets bgneal@4: return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY]) bgneal@4: except redis.RedisError, e: bgneal@4: logger.error(e) bgneal@4: bgneal@4: return set() bgneal@4: bgneal@4: In my Django application, I calling this function from a `custom inclusion template tag`_ bgneal@4: . bgneal@4: bgneal@4: bgneal@4: Conclusion bgneal@4: ---------- bgneal@4: bgneal@4: I hope this blog post gives you some idea of the usefulness of Redis. I expanded bgneal@4: on this example to also keep track of non-authenticated "guest" users. I simply added bgneal@4: another pair of sets to track IP addresses. bgneal@4: bgneal@4: If you are like me, you are probably already thinking about shifting some functions that you bgneal@4: awkwardly jammed onto a traditional database to Redis and other "NoSQL" bgneal@4: technologies. bgneal@4: bgneal@4: .. _Redis: http://redis.io/ bgneal@4: .. _Memcached: http://memcached.org/ bgneal@4: .. _Simon Willison's Redis tutorial: http://simonwillison.net/static/2010/redis-tutorial/ bgneal@4: .. _redis-py: https://github.com/andymccurdy/redis-py bgneal@4: .. _Django: http://djangoproject.com bgneal@4: .. _Redis API: http://redis.io/commands bgneal@4: .. _SADD: http://redis.io/commands/sadd bgneal@4: .. _RENAME: http://redis.io/commands/rename bgneal@4: .. _SUNION: http://redis.io/commands/sunion bgneal@4: .. _custom middleware: http://docs.djangoproject.com/en/1.3/topics/http/middleware/ bgneal@4: .. _custom management command: http://docs.djangoproject.com/en/1.3/howto/custom-management-commands/ bgneal@4: .. _Celery: http://celeryproject.org/ bgneal@4: .. _custom inclusion template tag: http://docs.djangoproject.com/en/1.3/howto/custom-template-tags/#inclusion-tags bgneal@4: .. _new post: http://deathofagremmie.com/2011/12/17/who-s-online-with-redis-python-a-slight-return/