bgneal@4: Completing the Enigma Challenge
bgneal@4: ###############################
bgneal@4: 
bgneal@4: :date: 2012-07-22 13:30
bgneal@4: :tags: Enigma, Py-Enigma, Cpp-Enigma, C++
bgneal@4: :slug: completing-the-enigma-challenge
bgneal@4: :author: Brian Neal
bgneal@4: 
bgneal@4: Since my last `blog post
bgneal@4: <http://deathofagremmie.com/2012/06/06/introducing-py-enigma/>`_, I have spent
bgneal@4: almost all my free time working on `Dirk Rijmenants' Enigma Challenge
bgneal@4: <http://users.telenet.be/d.rijmenants/en/challenge.htm>`_. I'm very happy to
bgneal@4: report that I cracked the last message on the evening of July 10. I thought it
bgneal@4: might be interesting to summarize my experiences working on this challenge in a
bgneal@4: blog post.
bgneal@4: 
bgneal@4: The first nine
bgneal@4: --------------
bgneal@4: 
bgneal@4: I had a lot of fun building Py-Enigma_, an Enigma machine simulator library
bgneal@4: written in Python_ based on the information I found on Dirk's `Technical Details
bgneal@4: of the Enigma Machine`_ page. After I had built it and played with it for a
bgneal@4: while, I discovered Dirk also had an `Enigma Challenge`_. I decided this would
bgneal@4: be a good test for Py-Enigma_, and it may give me some ideas on how to improve
bgneal@4: the library for breaking messages.
bgneal@4: 
bgneal@4: The first nine messages were pretty easy to solve using Py-Enigma_. Dirk gives
bgneal@4: you most of the key information for each problem, and one can then write a small
bgneal@4: program to brute force the missing information. I was able to solve the first
bgneal@4: nine messages in a weekend. This was very enjoyable, as it did require some
bgneal@4: thinking about how to simulate the problem and then executing it. I also learned
bgneal@4: about `Python's itertools <http://docs.python.org/library/itertools.html>`_
bgneal@4: library, which contains some very handy functions for generating permutations
bgneal@4: and combinations for brute forcing a solution.
bgneal@4: 
bgneal@4: Dirk did a great job on the messages. They referenced actual events from World
bgneal@4: War II and I ended up spending a fair amount of time on Wikipedia reading about
bgneal@4: them. Translating the messages from German to English was a bit difficult, but I
bgneal@4: relied heavily on `Google Translate <http://translate.google.com>`_.
bgneal@4: 
bgneal@4: Message 10
bgneal@4: ----------
bgneal@4: 
bgneal@4: But then I came to the last message. Here, Dirk doesn't give you any key
bgneal@4: information, just ciphertext. Gulp. If you include 2 possible reflectors, the
bgneal@4: Enigma machine's key space for message 10 is somewhere around 2 x 10\
bgneal@4: :sup:`23` keys, so it is impossible to brute force every possible key, unless
bgneal@4: you can afford to wait around for the heat death of the universe.
bgneal@4: 
bgneal@4: I then did some research, and learned about a technique for brute-forcing only
bgneal@4: the rotor and message settings, and then performing a heuristic technique called
bgneal@4: "hill-climbing" on the plugboard settings. I am deeply indepted to Frode Weierud
bgneal@4: and Geoff Sullivan, who have published high level descriptions of this
bgneal@4: technique. See Frode's `The Enigma Collection
bgneal@4: <http://cryptocellar.org/Enigma/>`_ page and Geoff's `Crypto Barn
bgneal@4: <http://www.hut-six.co.uk/>`_ for more information.
bgneal@4: 
bgneal@4: Python is too slow, or is it?
bgneal@4: -----------------------------
bgneal@4: 
bgneal@4: After a week or so of studying and tinkering, I came up with a hill-climbing
bgneal@4: algorithm in Python. I then started an attempt to break message 10, but after
bgneal@4: running the program for a little over 24 hours, I realized it was too slow. I
bgneal@4: did a "back of the envelope" calculation and determined I would need several
bgneal@4: hundred years to hill-climb every rotor and message setting. This was a bit
bgneal@4: disappointing, but I knew it was a possibility going in.
bgneal@4: 
bgneal@4: I then set out, somewhat reluctantly, to port Py-Enigma to C++. These days I
bgneal@4: don't do any C++ at home, only at work. However I got much happier when I
bgneal@4: decided this was an opportunity to try out some new C++11 features (the embedded
bgneal@4: compiler we use at work has no support for C++11). Okay, things are fun again! I
bgneal@4: have to say that C++11 is really quite enjoyable, and I made good use of the new
bgneal@4: ``auto`` style variable declarations, range-based for-loops, and brace-style
bgneal@4: initialization for containers like ``vector`` and ``map``. 
bgneal@4: 
bgneal@4: C++ is too slow, or is it?
bgneal@4: --------------------------
bgneal@4: 
bgneal@4: It took a fair amount of time to re-tool my code to C++11. I ensured all my unit
bgneal@4: tests originally written in Python could pass in C++, and that I could solve
bgneal@4: some of the previous messages. I then re-implemented my hill-climbing
bgneal@4: algorithm in C++ and made another attempt at cracking message 10.
bgneal@4: 
bgneal@4: My C++ solution was indeed faster, but only by a factor of 8 or 9. I calculated
bgneal@4: I would need about 35 years to hill-climb all rotor and message settings, and I
bgneal@4: didn't really want to see if I could out-live my program. My morale was very low
bgneal@4: at this point. I wasn't sure if I could ever solve this self-imposed problem.
bgneal@4: 
bgneal@4: Algorithms & Data Structures are key
bgneal@4: ------------------------------------
bgneal@4: 
bgneal@4: I had wanted to test myself with this problem and had avoided looking at anyone
bgneal@4: else's source code. However, at this juncture, I needed some help. I turned to
bgneal@4: Stephan Krah's `M4 Message Breaking Project
bgneal@4: <http://www.bytereef.org/m4_project.html>`_. Stephan had started a distributed
bgneal@4: computing attack on several 4-rotor naval Enigma machine messages. He had
bgneal@4: graciously made his source code available. Perhaps I could get some hints by
bgneal@4: looking at his code.
bgneal@4: 
bgneal@4: Indeed, Stephen's code provided the break I needed. I discovered a very cool
bgneal@4: trick that Stephen was doing right before he started his hill-climb. He
bgneal@4: pre-computed every path through the machine for every letter of the alphabet.
bgneal@4: Since hill-climbing involves running the message through the machine roughly
bgneal@4: 600 to 1000 times, this proved to be a huge time saver. After borrowing this
bgneal@4: idea, my hill-climbing times for one fixed rotor & message setting went down
bgneal@4: from 250 milliseconds to 2! 
bgneal@4: 
bgneal@4: I immediately stopped looking at Stephen's code at this point; I didn't bother
bgneal@4: to compare our hill-climbing algorithms. I believe both of us are influenced by
bgneal@4: Weierud & Sullivan here. I was now convinced I had the speed-up I needed to make
bgneal@4: a serious attempt at message 10 again.
bgneal@4: 
bgneal@4: This showed me how critically important it is to have the right data structure
bgneal@4: and algorithm for a problem. However there were two other lessons I needed to
bgneal@4: learn before I was successful.
bgneal@4: 
bgneal@4: Utilize all your computing resources
bgneal@4: ------------------------------------
bgneal@4: 
bgneal@4: Up to this point my cracking program was linear. It started at the first rotor
bgneal@4: and message setting, then hill-climbed and noted the score I got. Then it
bgneal@4: proceeded to the next message setting and hill-climbed, noting the score of that
bgneal@4: attempt, and so on. There were two problems with this approach. 
bgneal@4: 
bgneal@4: I realized if I could do some sort of multi-processing, I could search the key
bgneal@4: space faster. I thought about forking off multiple processes or maybe using
bgneal@4: threads, but that was over-thinking it. In the end, I added command-line
bgneal@4: arguments to the program to tell it where in the key space to start looking. I
bgneal@4: could then simply run multiple instances of my program. My development laptop,
bgneal@4: which runs Ubuntu, is from 2008, and it has 2 cores in it. My desktop PC (for
bgneal@4: gaming), has 4 cores! I next spent a couple nights installing MinGW_ and getting
bgneal@4: my application to build in that environment. I could now run 6 instances of my
bgneal@4: cracking program. Hopefully this would pay off and shorten the time further.
bgneal@4: 
bgneal@4: (I didn't even attempt to commandeer my wife's and kids' computers. That
bgneal@4: wouldn't have gone down so well.)
bgneal@4: 
bgneal@4: How do you know when you are done?
bgneal@4: ----------------------------------
bgneal@4: 
bgneal@4: The final problem that I had was determining when my cracking program had in
bgneal@4: fact cracked the message. The hill-climbing attempts produced scores for each
bgneal@4: key setting. Up to this point, based on running my program on messages 1 through
bgneal@4: 9, I had noticed that when I got a score divided by the message length of around
bgneal@4: 10 I had cracked the message. But my tests on messages 1 through 9 were
bgneal@4: hill-climbing using the known ring settings. My message cracker was using a
bgneal@4: shortcut: I was only searching ring settings on the "fast" rotor to save time.
bgneal@4: This would let me get the first part of a message, but if the middle or
bgneal@4: left-most rotor stepped at the wrong time the rest of the message would be
bgneal@4: unintelligible. 
bgneal@4: 
bgneal@4: To verify my message cracking program, I ran it on some of the previous
bgneal@4: messages. I was shooting for a score divided by message length of 10. Using this
bgneal@4: heuristic, I was in fact able to crack some of the previous messages, but not
bgneal@4: others. It took me a while to realize that not searching the middle rotor's ring
bgneal@4: setting was causing this. The light bulb came on, and I realized that my
bgneal@4: heuristic of 10 is only valid when I search all the ring settings. Instead I
bgneal@4: should just keep track of the highest scores. When you get close enough, the
bgneal@4: score will be significantly higher than average score. Thus I may not know when
bgneal@4: I am done until I see a large spike in the score. I would then have to write
bgneal@4: another program to refine the solution to search for the middle and left-most
bgneal@4: ring setting. 
bgneal@4: 
bgneal@4: Success at last
bgneal@4: ---------------
bgneal@4: 
bgneal@4: It took a little over a month of evenings and weekends to get to this point.
bgneal@4: I even had a false start where I ran my program on 6 cores for 1.5 days only to
bgneal@4: realize I had a bug in my code and I was only searching ring setting 0 on the
bgneal@4: fast ring. But once I had that worked out, I started a search on 6 cores on a
bgneal@4: Sunday night. I checked the logs off and on Monday, but no significant spike in
bgneal@4: the scores were noted. Before leaving for work on Tuesday I checked again, and
bgneal@4: noticed that one instance of my program had found a score twice as high as any
bgneal@4: other. Could that be it? I really wanted to stay and find out, but I had to go
bgneal@4: to work! Needless to say I was a bit distracted at work that day.
bgneal@4: 
bgneal@4: After work, I ran the settings through my Python application and my heart
bgneal@4: pounded as I recognized a few German words at the beginning of the message. In
bgneal@4: fact I had gotten around 75% of the message, then some garbage, but then the
bgneal@4: last few words were recognizable German. I then wrote another application to
bgneal@4: search the middle ring setting space. Using the highest score from that program
bgneal@4: allowed me to finally see the entire message. I cannot tell you how happy and
bgneal@4: relieved I was at the same time!
bgneal@4: 
bgneal@4: Final thoughts
bgneal@4: --------------
bgneal@4: 
bgneal@4: For a while I didn't think I was going to be smart enough or just plain up to
bgneal@4: the task of solving message 10. Looking back on it, I see a nice progression of
bgneal@4: trying things, experimenting, setbacks, and a few "a-ha" moments that lead to the
bgneal@4: breakthrough. I am very grateful to Dirk Rijmenants for creating the challenge,
bgneal@4: which was a lot of fun. And I also wish to thank Frode Weierud and Geoff
bgneal@4: Sullivan for their hill-climbing algorithm description. Thanks again to Stephan
bgneal@4: Krah for the key speedup that made my attempt possible.
bgneal@4: 
bgneal@4: For now I have decided not to publish my cracking program, since it is a tiny
bgneal@4: bit specific to Dirk's challenge. I don't want to ruin his Enigma Challenge by
bgneal@4: giving anything away. I don't believe my broad descriptions here have revealed
bgneal@4: too much. Other people need to experience the challenge for themselves to fully
bgneal@4: appreciate the thrill and hard work that go into code breaking.
bgneal@4: 
bgneal@4: I should also note that there are other ways of solving Dirk's challenge. You
bgneal@4: could, with a bit of patience, solve messages 1 through 9 with one of the many
bgneal@4: Enigma Machine GUI simulators (Dirk has a fine one). I'm not certain how
bgneal@4: you could break message 10 using a GUI simulator and trial and error, however.
bgneal@4: 
bgneal@4: It is my understanding that this style of Enigma message breaking was not the
bgneal@4: technique used by the allied code-breakers during the war. I believe they used
bgneal@4: knowledge of certain key words, or "cribs", in messages received in weather
bgneal@4: reports and other regular messages to find the key for the day. I look forward
bgneal@4: to reading more about their efforts now.
bgneal@4: 
bgneal@4: Finally, I have published the C++11 port of Py-Enigma_, which I am calling
bgneal@4: (rather unimaginatively) Cpp-Enigma_ on bitbucket. I hope Cpp-Enigma_ and
bgneal@4: Py-Enigma_ are useful for other people who want to explore the fascinating world
bgneal@4: of Enigma codes.
bgneal@4: 
bgneal@4: .. _Py-Enigma: https://bitbucket.org/bgneal/enigma
bgneal@4: .. _Python: http://www.python.org
bgneal@4: .. _Technical Details of the Enigma Machine: http://users.telenet.be/d.rijmenants/en/enigmatech.htm
bgneal@4: .. _Enigma Challenge: http://users.telenet.be/d.rijmenants/en/challenge.htm
bgneal@4: .. _Cpp-Enigma: https://bitbucket.org/bgneal/cpp-enigma
bgneal@4: .. _MinGW: http://www.mingw.org/