diff content/Coding/002-redis-whos-online.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.
author Brian Neal <bgneal@gmail.com>
date Thu, 30 Jan 2014 21:45:03 -0600
parents
children 49bebfa6f9d3
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/content/Coding/002-redis-whos-online.rst	Thu Jan 30 21:45:03 2014 -0600
@@ -0,0 +1,269 @@
+A better "Who's Online" with Redis & Python
+###########################################
+
+:date: 2011-04-25 12:00
+:tags: Redis, Python
+:slug: a-better-who-s-online-with-redis-python
+:author: Brian Neal
+
+**Updated on December 17, 2011:** I found a better solution. Head on over to
+the `new post`_ to check it out.
+
+
+Who's What?
+-----------
+
+My website, like many others, has a "who's online" feature. It displays the
+names of authenticated users that have been seen over the course of the last ten
+minutes or so. It may seem a minor feature at first, but I find it really does a lot to
+"humanize" the site and make it seem more like a community gathering place.
+
+My first implementation of this feature used the MySQL database to update a
+per-user timestamp whenever a request from an authenticated user arrived.
+Actually, this seemed excessive to me, so I used a strategy involving an "online"
+cookie that has a five minute expiration time. Whenever I see an authenticated
+user without the online cookie I update their timestamp and then hand them back
+a cookie that will expire in five minutes. In this way I don't have to hit the
+database on every single request.
+
+This approach worked fine but it has some aspects that didn't sit right with me:
+
+* It seems like overkill to use the database to store temporary, trivial information like
+  this. It doesn't feel like a good use of a full-featured relational database
+  management system (RDBMS).
+* I am writing to the database during a GET request. Ideally, all GET requests should
+  be idempotent. Of course if this is strictly followed, it would be
+  impossible to create a "who's online" feature in the first place. You'd have
+  to require the user to POST data periodically. However, writing to a RDBMS
+  during a GET request is something I feel guilty about and try to avoid when I
+  can.
+
+
+Redis
+-----
+
+Enter Redis_. I discovered Redis recently, and it is pure, white-hot
+awesomeness. What is Redis? It's one of those projects that gets slapped with
+the "NoSQL" label. And while I'm still trying to figure that buzzword out, Redis makes
+sense to me when described as a lightweight data structure server.
+Memcached_ can store key-value pairs very fast, where the value is always a string.
+Redis goes one step further and stores not only strings, but data
+structures like lists, sets, and hashes. For a great overview of what Redis is
+and what you can do with it, check out `Simon Willison's Redis tutorial`_.
+
+Another reason why I like Redis is that it is easy to install and deploy.
+It is straight C code without any dependencies. Thus you can build it from
+source just about anywhere. Your Linux distro may have a package for it, but it
+is just as easy to grab the latest tarball and build it yourself.
+
+I've really come to appreciate Redis for being such a small and lightweight
+tool. At the same time, it is very powerful and effective for filling those
+tasks that a traditional RDBMS is not good at.
+
+For working with Redis in Python, you'll need to grab Andy McCurdy's redis-py_
+client library. It can be installed with a simple
+
+.. sourcecode:: sh
+
+   $ sudo pip install redis
+
+
+Who's Online with Redis
+-----------------------
+
+Now that we are going to use Redis, how do we implement a "who's online"
+feature? The first step is to get familiar with the `Redis API`_.
+
+One approach to the "who's online" problem is to add a user name to a set
+whenever we see a request from that user. That's fine but how do we know when
+they have stopped browsing the site? We have to periodically clean out the
+set in order to time people out. A cron job, for example, could delete the
+set every five minutes.
+
+A small problem with deleting the set is that people will abruptly disappear
+from the site every five minutes. In order to give more gradual behavior we
+could utilize two sets, a "current" set and an "old" set. As users are seen, we
+add their names to the current set. Every five minutes or so (season to taste),
+we simply overwrite the old set with the contents of the current set, then clear
+out the current set. At any given time, the set of who's online is the union
+of these two sets.
+
+This approach doesn't give exact results of course, but it is perfectly fine for my site.
+
+Looking over the Redis API, we see that we'll be making use of the following
+commands:
+
+* SADD_ for adding members to the current set.
+* RENAME_ for copying the current set to the old, as well as destroying the
+  current set all in one step.
+* SUNION_ for performing a union on the current and old sets to produce the set
+  of who's online.
+
+And that's it! With these three primitives we have everything we need. This is
+because of the following useful Redis behaviors:
+
+* Performing a ``SADD`` against a set that doesn't exist creates the set and is
+  not an error.
+* Performing a ``SUNION`` with sets that don't exist is fine; they are simply
+  treated as empty sets.
+
+The one caveat involves the ``RENAME`` command. If the key you wish to rename
+does not exist, the Python Redis client treats this as an error and an exception
+is thrown.
+
+Experimenting with algorithms and ideas is quite easy with Redis. You can either
+use the Python Redis client in a Python interactive interpreter shell, or you can
+use the command-line client that comes with Redis. Either way you can quickly
+try out commands and refine your approach.
+
+
+Implementation
+--------------
+
+My website is powered by Django_, but I am not going to show any Django specific
+code here. Instead I'll show just the pure Python parts, and hopefully you can
+adapt it to whatever framework, if any, you are using.
+
+I created a Python module to hold this functionality:
+``whos_online.py``. Throughout this module I use a lot of exception handling,
+mainly because if the Redis server has crashed (or if I forgot to start it, say
+in development) I don't want my website to be unusable. If Redis is unavailable,
+I simply log an error and drive on. Note that in my limited experience Redis is
+very stable and has not crashed on me once, but it is good to be defensive.
+
+The first important function used throughout this module is a function to obtain
+a connection to the Redis server:
+
+.. sourcecode:: python
+
+   import logging
+   import redis
+
+   logger = logging.getLogger(__name__)
+
+   def _get_connection():
+       """
+       Create and return a Redis connection. Returns None on failure.
+       """
+       try:
+           conn = redis.Redis(host=HOST, port=PORT, db=DB)
+           return conn
+       except redis.RedisError, e:
+           logger.error(e)
+
+       return None
+
+The ``HOST``, ``PORT``, and ``DB`` constants can come from a
+configuration file or they could be module-level constants. In my case they are set in my
+Django ``settings.py`` file. Once we have this connection object, we are free to
+use the Redis API exposed via the Python Redis client.
+
+To update the current set whenever we see a user, I call this function:
+
+.. sourcecode:: python
+
+   # Redis key names:
+   USER_CURRENT_KEY = "wo_user_current"
+   USER_OLD_KEY = "wo_user_old"
+
+   def report_user(username):
+    """
+    Call this function when a user has been seen. The username will be added to
+    the current set.
+    """
+    conn = _get_connection()
+    if conn:
+        try:
+            conn.sadd(USER_CURRENT_KEY, username)
+        except redis.RedisError, e:
+            logger.error(e)
+
+If you are using Django, a good spot to call this function is from a piece
+of `custom middleware`_. I kept my "5 minute cookie" algorithm to avoid doing this on
+every request although it is probably unnecessary on my low traffic site.
+
+Periodically you need to "age out" the sets by destroying the old set, moving
+the current set to the old set, and then emptying the current set. 
+
+.. sourcecode:: python
+
+   def tick():
+       """
+       Call this function to "age out" the old set by renaming the current set
+       to the old.
+       """
+       conn = _get_connection()
+       if conn:
+          # An exception may be raised if the current key doesn't exist; if that
+          # happens we have to delete the old set because no one is online.
+          try:
+              conn.rename(USER_CURRENT_KEY, USER_OLD_KEY)
+          except redis.ResponseError:
+              try:
+                  del conn[old]
+              except redis.RedisError, e:
+                  logger.error(e)
+          except redis.RedisError, e:
+              logger.error(e)
+
+As mentioned previously, if no one is on your site, eventually your current set
+will cease to exist as it is renamed and not populated further. If you attempt to 
+rename a non-existent key, the Python Redis client raises a ``ResponseError`` exception. 
+If this occurs we just manually delete the old set. In a bit of Pythonic cleverness,
+the Python Redis client supports the ``del`` syntax to support this operation.
+
+The ``tick()`` function can be called periodically by a cron job, for example. If you are using Django,
+you could create a `custom management command`_ that calls ``tick()`` and schedule cron 
+to execute it. Alternatively, you could use something like Celery_ to schedule a
+job to do the same. (As an aside, Redis can be used as a back-end for Celery, something that I hope
+to explore in the near future).
+
+Finally, you need a way to obtain the current "who's online" set, which again is
+a union of the current and old sets.
+
+.. sourcecode:: python
+
+   def get_users_online():
+       """
+       Returns a set of user names which is the union of the current and old
+       sets.
+       """
+       conn = _get_connection()
+       if conn:
+           try:
+               # Note that keys that do not exist are considered empty sets
+               return conn.sunion([USER_CURRENT_KEY, USER_OLD_KEY])
+           except redis.RedisError, e:
+               logger.error(e)
+
+       return set()
+
+In my Django application, I calling this function from a `custom inclusion template tag`_
+.
+
+
+Conclusion
+----------
+
+I hope this blog post gives you some idea of the usefulness of Redis. I expanded
+on this example to also keep track of non-authenticated "guest" users. I simply added
+another pair of sets to track IP addresses.
+
+If you are like me, you are probably already thinking about shifting some functions that you
+awkwardly jammed onto a traditional database to Redis and other "NoSQL"
+technologies.
+
+.. _Redis: http://redis.io/
+.. _Memcached: http://memcached.org/
+.. _Simon Willison's Redis tutorial: http://simonwillison.net/static/2010/redis-tutorial/
+.. _redis-py: https://github.com/andymccurdy/redis-py
+.. _Django: http://djangoproject.com
+.. _Redis API: http://redis.io/commands
+.. _SADD: http://redis.io/commands/sadd
+.. _RENAME: http://redis.io/commands/rename
+.. _SUNION: http://redis.io/commands/sunion
+.. _custom middleware: http://docs.djangoproject.com/en/1.3/topics/http/middleware/
+.. _custom management command: http://docs.djangoproject.com/en/1.3/howto/custom-management-commands/
+.. _Celery: http://celeryproject.org/
+.. _custom inclusion template tag: http://docs.djangoproject.com/en/1.3/howto/custom-template-tags/#inclusion-tags
+.. _new post: http://deathofagremmie.com/2011/12/17/who-s-online-with-redis-python-a-slight-return/