Mercurial > public > pelican-blog
diff content/Coding/019-enigma-challenge.rst @ 4:7ce6393e6d30
Adding converted blog posts from old blog.
author | Brian Neal <bgneal@gmail.com> |
---|---|
date | Thu, 30 Jan 2014 21:45:03 -0600 |
parents | |
children | 6e0d4799796d |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/Coding/019-enigma-challenge.rst Thu Jan 30 21:45:03 2014 -0600 @@ -0,0 +1,223 @@ +Completing the Enigma Challenge +############################### + +:date: 2012-07-22 13:30 +:tags: Enigma, Py-Enigma, Cpp-Enigma, C++ +:slug: completing-the-enigma-challenge +:author: Brian Neal + +Since my last `blog post +<http://deathofagremmie.com/2012/06/06/introducing-py-enigma/>`_, I have spent +almost all my free time working on `Dirk Rijmenants' Enigma Challenge +<http://users.telenet.be/d.rijmenants/en/challenge.htm>`_. I'm very happy to +report that I cracked the last message on the evening of July 10. I thought it +might be interesting to summarize my experiences working on this challenge in a +blog post. + +The first nine +-------------- + +I had a lot of fun building Py-Enigma_, an Enigma machine simulator library +written in Python_ based on the information I found on Dirk's `Technical Details +of the Enigma Machine`_ page. After I had built it and played with it for a +while, I discovered Dirk also had an `Enigma Challenge`_. I decided this would +be a good test for Py-Enigma_, and it may give me some ideas on how to improve +the library for breaking messages. + +The first nine messages were pretty easy to solve using Py-Enigma_. Dirk gives +you most of the key information for each problem, and one can then write a small +program to brute force the missing information. I was able to solve the first +nine messages in a weekend. This was very enjoyable, as it did require some +thinking about how to simulate the problem and then executing it. I also learned +about `Python's itertools <http://docs.python.org/library/itertools.html>`_ +library, which contains some very handy functions for generating permutations +and combinations for brute forcing a solution. + +Dirk did a great job on the messages. They referenced actual events from World +War II and I ended up spending a fair amount of time on Wikipedia reading about +them. Translating the messages from German to English was a bit difficult, but I +relied heavily on `Google Translate <http://translate.google.com>`_. + +Message 10 +---------- + +But then I came to the last message. Here, Dirk doesn't give you any key +information, just ciphertext. Gulp. If you include 2 possible reflectors, the +Enigma machine's key space for message 10 is somewhere around 2 x 10\ +:sup:`23` keys, so it is impossible to brute force every possible key, unless +you can afford to wait around for the heat death of the universe. + +I then did some research, and learned about a technique for brute-forcing only +the rotor and message settings, and then performing a heuristic technique called +"hill-climbing" on the plugboard settings. I am deeply indepted to Frode Weierud +and Geoff Sullivan, who have published high level descriptions of this +technique. See Frode's `The Enigma Collection +<http://cryptocellar.org/Enigma/>`_ page and Geoff's `Crypto Barn +<http://www.hut-six.co.uk/>`_ for more information. + +Python is too slow, or is it? +----------------------------- + +After a week or so of studying and tinkering, I came up with a hill-climbing +algorithm in Python. I then started an attempt to break message 10, but after +running the program for a little over 24 hours, I realized it was too slow. I +did a "back of the envelope" calculation and determined I would need several +hundred years to hill-climb every rotor and message setting. This was a bit +disappointing, but I knew it was a possibility going in. + +I then set out, somewhat reluctantly, to port Py-Enigma to C++. These days I +don't do any C++ at home, only at work. However I got much happier when I +decided this was an opportunity to try out some new C++11 features (the embedded +compiler we use at work has no support for C++11). Okay, things are fun again! I +have to say that C++11 is really quite enjoyable, and I made good use of the new +``auto`` style variable declarations, range-based for-loops, and brace-style +initialization for containers like ``vector`` and ``map``. + +C++ is too slow, or is it? +-------------------------- + +It took a fair amount of time to re-tool my code to C++11. I ensured all my unit +tests originally written in Python could pass in C++, and that I could solve +some of the previous messages. I then re-implemented my hill-climbing +algorithm in C++ and made another attempt at cracking message 10. + +My C++ solution was indeed faster, but only by a factor of 8 or 9. I calculated +I would need about 35 years to hill-climb all rotor and message settings, and I +didn't really want to see if I could out-live my program. My morale was very low +at this point. I wasn't sure if I could ever solve this self-imposed problem. + +Algorithms & Data Structures are key +------------------------------------ + +I had wanted to test myself with this problem and had avoided looking at anyone +else's source code. However, at this juncture, I needed some help. I turned to +Stephan Krah's `M4 Message Breaking Project +<http://www.bytereef.org/m4_project.html>`_. Stephan had started a distributed +computing attack on several 4-rotor naval Enigma machine messages. He had +graciously made his source code available. Perhaps I could get some hints by +looking at his code. + +Indeed, Stephen's code provided the break I needed. I discovered a very cool +trick that Stephen was doing right before he started his hill-climb. He +pre-computed every path through the machine for every letter of the alphabet. +Since hill-climbing involves running the message through the machine roughly +600 to 1000 times, this proved to be a huge time saver. After borrowing this +idea, my hill-climbing times for one fixed rotor & message setting went down +from 250 milliseconds to 2! + +I immediately stopped looking at Stephen's code at this point; I didn't bother +to compare our hill-climbing algorithms. I believe both of us are influenced by +Weierud & Sullivan here. I was now convinced I had the speed-up I needed to make +a serious attempt at message 10 again. + +This showed me how critically important it is to have the right data structure +and algorithm for a problem. However there were two other lessons I needed to +learn before I was successful. + +Utilize all your computing resources +------------------------------------ + +Up to this point my cracking program was linear. It started at the first rotor +and message setting, then hill-climbed and noted the score I got. Then it +proceeded to the next message setting and hill-climbed, noting the score of that +attempt, and so on. There were two problems with this approach. + +I realized if I could do some sort of multi-processing, I could search the key +space faster. I thought about forking off multiple processes or maybe using +threads, but that was over-thinking it. In the end, I added command-line +arguments to the program to tell it where in the key space to start looking. I +could then simply run multiple instances of my program. My development laptop, +which runs Ubuntu, is from 2008, and it has 2 cores in it. My desktop PC (for +gaming), has 4 cores! I next spent a couple nights installing MinGW_ and getting +my application to build in that environment. I could now run 6 instances of my +cracking program. Hopefully this would pay off and shorten the time further. + +(I didn't even attempt to commandeer my wife's and kids' computers. That +wouldn't have gone down so well.) + +How do you know when you are done? +---------------------------------- + +The final problem that I had was determining when my cracking program had in +fact cracked the message. The hill-climbing attempts produced scores for each +key setting. Up to this point, based on running my program on messages 1 through +9, I had noticed that when I got a score divided by the message length of around +10 I had cracked the message. But my tests on messages 1 through 9 were +hill-climbing using the known ring settings. My message cracker was using a +shortcut: I was only searching ring settings on the "fast" rotor to save time. +This would let me get the first part of a message, but if the middle or +left-most rotor stepped at the wrong time the rest of the message would be +unintelligible. + +To verify my message cracking program, I ran it on some of the previous +messages. I was shooting for a score divided by message length of 10. Using this +heuristic, I was in fact able to crack some of the previous messages, but not +others. It took me a while to realize that not searching the middle rotor's ring +setting was causing this. The light bulb came on, and I realized that my +heuristic of 10 is only valid when I search all the ring settings. Instead I +should just keep track of the highest scores. When you get close enough, the +score will be significantly higher than average score. Thus I may not know when +I am done until I see a large spike in the score. I would then have to write +another program to refine the solution to search for the middle and left-most +ring setting. + +Success at last +--------------- + +It took a little over a month of evenings and weekends to get to this point. +I even had a false start where I ran my program on 6 cores for 1.5 days only to +realize I had a bug in my code and I was only searching ring setting 0 on the +fast ring. But once I had that worked out, I started a search on 6 cores on a +Sunday night. I checked the logs off and on Monday, but no significant spike in +the scores were noted. Before leaving for work on Tuesday I checked again, and +noticed that one instance of my program had found a score twice as high as any +other. Could that be it? I really wanted to stay and find out, but I had to go +to work! Needless to say I was a bit distracted at work that day. + +After work, I ran the settings through my Python application and my heart +pounded as I recognized a few German words at the beginning of the message. In +fact I had gotten around 75% of the message, then some garbage, but then the +last few words were recognizable German. I then wrote another application to +search the middle ring setting space. Using the highest score from that program +allowed me to finally see the entire message. I cannot tell you how happy and +relieved I was at the same time! + +Final thoughts +-------------- + +For a while I didn't think I was going to be smart enough or just plain up to +the task of solving message 10. Looking back on it, I see a nice progression of +trying things, experimenting, setbacks, and a few "a-ha" moments that lead to the +breakthrough. I am very grateful to Dirk Rijmenants for creating the challenge, +which was a lot of fun. And I also wish to thank Frode Weierud and Geoff +Sullivan for their hill-climbing algorithm description. Thanks again to Stephan +Krah for the key speedup that made my attempt possible. + +For now I have decided not to publish my cracking program, since it is a tiny +bit specific to Dirk's challenge. I don't want to ruin his Enigma Challenge by +giving anything away. I don't believe my broad descriptions here have revealed +too much. Other people need to experience the challenge for themselves to fully +appreciate the thrill and hard work that go into code breaking. + +I should also note that there are other ways of solving Dirk's challenge. You +could, with a bit of patience, solve messages 1 through 9 with one of the many +Enigma Machine GUI simulators (Dirk has a fine one). I'm not certain how +you could break message 10 using a GUI simulator and trial and error, however. + +It is my understanding that this style of Enigma message breaking was not the +technique used by the allied code-breakers during the war. I believe they used +knowledge of certain key words, or "cribs", in messages received in weather +reports and other regular messages to find the key for the day. I look forward +to reading more about their efforts now. + +Finally, I have published the C++11 port of Py-Enigma_, which I am calling +(rather unimaginatively) Cpp-Enigma_ on bitbucket. I hope Cpp-Enigma_ and +Py-Enigma_ are useful for other people who want to explore the fascinating world +of Enigma codes. + +.. _Py-Enigma: https://bitbucket.org/bgneal/enigma +.. _Python: http://www.python.org +.. _Technical Details of the Enigma Machine: http://users.telenet.be/d.rijmenants/en/enigmatech.htm +.. _Enigma Challenge: http://users.telenet.be/d.rijmenants/en/challenge.htm +.. _Cpp-Enigma: https://bitbucket.org/bgneal/cpp-enigma +.. _MinGW: http://www.mingw.org/