view content/Coding/022-python-red-black-tree.rst @ 9:271bed1181df

New blog post on rebooting blog with Pelican.
author Brian Neal <bgneal@gmail.com>
date Sat, 01 Feb 2014 14:29:54 -0600
parents 7ce6393e6d30
children
line wrap: on
line source
A Red-Black Tree in Python
##########################

:date: 2012-12-28 11:10
:tags: Python, datastructures
:slug: a-red-black-tree-in-python
:author: Brian Neal

I've been working my way through Allen Downey's `Think Complexity`_ book. I'm
not very far in, but so far it's a great way to brush up on algorithms and
datastructures, and learn some new stuff about complexity science. Plus, all of
the code is written in Python! I've been doing the exercises, and most of them
take at most fifteen or twenty minutes. But at the end of his explanation on
hash tables he casually lobs this one out (3.4, exercise 5):

   A drawback of hashtables is that the elements have to be hashable, which
   usually means they have to be immutable. That’s why, in Python, you can use
   tuples but not lists as keys in a dictionary. An alternative is to use
   a tree-based map.
   
   Write an implementation of the map interface called TreeMap that uses
   a red-black tree to perform add and get in log time.

I've never researched red-black trees before, but as a C++ programmer I know
they are the datastructure that powers ``std::set`` and ``std::map``. So
I decided to take a look. I quickly realized this was not going to be a simple
exercise, as red-black trees are quite complicated to understand and implement.
They are basically binary search trees that do a lot of work to keep themselves
approximately balanced.

I spent a few nights reading up on red-black trees. A good explanation can be
found in Wikipedia_. There are even a handful of Python implementations
floating about, of varying quality. But finally I found a detailed explanation
that really clicked with me at `Eternally Confuzzled`_. Julienne Walker derives
a unique algorithm based upon the rules for red-black trees, and the
implementation code is non-recursive and top-down. Most of the other
implementations I found on the web seem to be based on the textbook
`Introduction To Algorithms`_, and often involve the use of parent pointers
and using dummy nodes to represent the nil leaves of the tree. Julienne's
solution avoided these things and seemed a bit less complex. However the best
reason to study the tutorial was the explanation was very coherent and
detailed. The other sources on the web seemed to be fragmented, missing
details, and lacking in explanation.

So to complete my `Think Complexity`_ exercise, I decided to port Julienne's
red-black tree algorithm from C to Python, and hopefully learn something along
the way. After a couple nights of work, and one `very embarrassing bug`_, I've
completed it. I can't say I quite understand every bit of the algorithm, but
I certainly learned a lot. You can view the `source code at Bitbucket`_, or
clone my `Think Complexity repository`_.

Many thanks to Julienne Walker for the `great tutorial`_! And I highly recommend
`Think Complexity`_. Check them both out.


.. _Think Complexity: http://www.greenteapress.com/compmod/
.. _Wikipedia: http://en.wikipedia.org/wiki/Red%E2%80%93black_tree
.. _Eternally Confuzzled: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx
.. _great tutorial: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx
.. _Introduction to Algorithms: http://mitpress.mit.edu/books/introduction-algorithms
.. _very embarrassing bug: http://deathofagremmie.com/2012/12/27/a-c-python-chained-assignment-gotcha/
.. _source code at Bitbucket: https://bitbucket.org/bgneal/think_complexity/src/0326803882adc4a598d890ee4d7d39d93cb64af7/redblacktree.py?at=default
.. _Think Complexity repository: https://bitbucket.org/bgneal/think_complexity