annotate content/Coding/019-enigma-challenge.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.
author Brian Neal <bgneal@gmail.com>
date Thu, 30 Jan 2014 21:45:03 -0600
parents
children 6e0d4799796d
rev   line source
bgneal@4 1 Completing the Enigma Challenge
bgneal@4 2 ###############################
bgneal@4 3
bgneal@4 4 :date: 2012-07-22 13:30
bgneal@4 5 :tags: Enigma, Py-Enigma, Cpp-Enigma, C++
bgneal@4 6 :slug: completing-the-enigma-challenge
bgneal@4 7 :author: Brian Neal
bgneal@4 8
bgneal@4 9 Since my last `blog post
bgneal@4 10 <http://deathofagremmie.com/2012/06/06/introducing-py-enigma/>`_, I have spent
bgneal@4 11 almost all my free time working on `Dirk Rijmenants' Enigma Challenge
bgneal@4 12 <http://users.telenet.be/d.rijmenants/en/challenge.htm>`_. I'm very happy to
bgneal@4 13 report that I cracked the last message on the evening of July 10. I thought it
bgneal@4 14 might be interesting to summarize my experiences working on this challenge in a
bgneal@4 15 blog post.
bgneal@4 16
bgneal@4 17 The first nine
bgneal@4 18 --------------
bgneal@4 19
bgneal@4 20 I had a lot of fun building Py-Enigma_, an Enigma machine simulator library
bgneal@4 21 written in Python_ based on the information I found on Dirk's `Technical Details
bgneal@4 22 of the Enigma Machine`_ page. After I had built it and played with it for a
bgneal@4 23 while, I discovered Dirk also had an `Enigma Challenge`_. I decided this would
bgneal@4 24 be a good test for Py-Enigma_, and it may give me some ideas on how to improve
bgneal@4 25 the library for breaking messages.
bgneal@4 26
bgneal@4 27 The first nine messages were pretty easy to solve using Py-Enigma_. Dirk gives
bgneal@4 28 you most of the key information for each problem, and one can then write a small
bgneal@4 29 program to brute force the missing information. I was able to solve the first
bgneal@4 30 nine messages in a weekend. This was very enjoyable, as it did require some
bgneal@4 31 thinking about how to simulate the problem and then executing it. I also learned
bgneal@4 32 about `Python's itertools <http://docs.python.org/library/itertools.html>`_
bgneal@4 33 library, which contains some very handy functions for generating permutations
bgneal@4 34 and combinations for brute forcing a solution.
bgneal@4 35
bgneal@4 36 Dirk did a great job on the messages. They referenced actual events from World
bgneal@4 37 War II and I ended up spending a fair amount of time on Wikipedia reading about
bgneal@4 38 them. Translating the messages from German to English was a bit difficult, but I
bgneal@4 39 relied heavily on `Google Translate <http://translate.google.com>`_.
bgneal@4 40
bgneal@4 41 Message 10
bgneal@4 42 ----------
bgneal@4 43
bgneal@4 44 But then I came to the last message. Here, Dirk doesn't give you any key
bgneal@4 45 information, just ciphertext. Gulp. If you include 2 possible reflectors, the
bgneal@4 46 Enigma machine's key space for message 10 is somewhere around 2 x 10\
bgneal@4 47 :sup:`23` keys, so it is impossible to brute force every possible key, unless
bgneal@4 48 you can afford to wait around for the heat death of the universe.
bgneal@4 49
bgneal@4 50 I then did some research, and learned about a technique for brute-forcing only
bgneal@4 51 the rotor and message settings, and then performing a heuristic technique called
bgneal@4 52 "hill-climbing" on the plugboard settings. I am deeply indepted to Frode Weierud
bgneal@4 53 and Geoff Sullivan, who have published high level descriptions of this
bgneal@4 54 technique. See Frode's `The Enigma Collection
bgneal@4 55 <http://cryptocellar.org/Enigma/>`_ page and Geoff's `Crypto Barn
bgneal@4 56 <http://www.hut-six.co.uk/>`_ for more information.
bgneal@4 57
bgneal@4 58 Python is too slow, or is it?
bgneal@4 59 -----------------------------
bgneal@4 60
bgneal@4 61 After a week or so of studying and tinkering, I came up with a hill-climbing
bgneal@4 62 algorithm in Python. I then started an attempt to break message 10, but after
bgneal@4 63 running the program for a little over 24 hours, I realized it was too slow. I
bgneal@4 64 did a "back of the envelope" calculation and determined I would need several
bgneal@4 65 hundred years to hill-climb every rotor and message setting. This was a bit
bgneal@4 66 disappointing, but I knew it was a possibility going in.
bgneal@4 67
bgneal@4 68 I then set out, somewhat reluctantly, to port Py-Enigma to C++. These days I
bgneal@4 69 don't do any C++ at home, only at work. However I got much happier when I
bgneal@4 70 decided this was an opportunity to try out some new C++11 features (the embedded
bgneal@4 71 compiler we use at work has no support for C++11). Okay, things are fun again! I
bgneal@4 72 have to say that C++11 is really quite enjoyable, and I made good use of the new
bgneal@4 73 ``auto`` style variable declarations, range-based for-loops, and brace-style
bgneal@4 74 initialization for containers like ``vector`` and ``map``.
bgneal@4 75
bgneal@4 76 C++ is too slow, or is it?
bgneal@4 77 --------------------------
bgneal@4 78
bgneal@4 79 It took a fair amount of time to re-tool my code to C++11. I ensured all my unit
bgneal@4 80 tests originally written in Python could pass in C++, and that I could solve
bgneal@4 81 some of the previous messages. I then re-implemented my hill-climbing
bgneal@4 82 algorithm in C++ and made another attempt at cracking message 10.
bgneal@4 83
bgneal@4 84 My C++ solution was indeed faster, but only by a factor of 8 or 9. I calculated
bgneal@4 85 I would need about 35 years to hill-climb all rotor and message settings, and I
bgneal@4 86 didn't really want to see if I could out-live my program. My morale was very low
bgneal@4 87 at this point. I wasn't sure if I could ever solve this self-imposed problem.
bgneal@4 88
bgneal@4 89 Algorithms & Data Structures are key
bgneal@4 90 ------------------------------------
bgneal@4 91
bgneal@4 92 I had wanted to test myself with this problem and had avoided looking at anyone
bgneal@4 93 else's source code. However, at this juncture, I needed some help. I turned to
bgneal@4 94 Stephan Krah's `M4 Message Breaking Project
bgneal@4 95 <http://www.bytereef.org/m4_project.html>`_. Stephan had started a distributed
bgneal@4 96 computing attack on several 4-rotor naval Enigma machine messages. He had
bgneal@4 97 graciously made his source code available. Perhaps I could get some hints by
bgneal@4 98 looking at his code.
bgneal@4 99
bgneal@4 100 Indeed, Stephen's code provided the break I needed. I discovered a very cool
bgneal@4 101 trick that Stephen was doing right before he started his hill-climb. He
bgneal@4 102 pre-computed every path through the machine for every letter of the alphabet.
bgneal@4 103 Since hill-climbing involves running the message through the machine roughly
bgneal@4 104 600 to 1000 times, this proved to be a huge time saver. After borrowing this
bgneal@4 105 idea, my hill-climbing times for one fixed rotor & message setting went down
bgneal@4 106 from 250 milliseconds to 2!
bgneal@4 107
bgneal@4 108 I immediately stopped looking at Stephen's code at this point; I didn't bother
bgneal@4 109 to compare our hill-climbing algorithms. I believe both of us are influenced by
bgneal@4 110 Weierud & Sullivan here. I was now convinced I had the speed-up I needed to make
bgneal@4 111 a serious attempt at message 10 again.
bgneal@4 112
bgneal@4 113 This showed me how critically important it is to have the right data structure
bgneal@4 114 and algorithm for a problem. However there were two other lessons I needed to
bgneal@4 115 learn before I was successful.
bgneal@4 116
bgneal@4 117 Utilize all your computing resources
bgneal@4 118 ------------------------------------
bgneal@4 119
bgneal@4 120 Up to this point my cracking program was linear. It started at the first rotor
bgneal@4 121 and message setting, then hill-climbed and noted the score I got. Then it
bgneal@4 122 proceeded to the next message setting and hill-climbed, noting the score of that
bgneal@4 123 attempt, and so on. There were two problems with this approach.
bgneal@4 124
bgneal@4 125 I realized if I could do some sort of multi-processing, I could search the key
bgneal@4 126 space faster. I thought about forking off multiple processes or maybe using
bgneal@4 127 threads, but that was over-thinking it. In the end, I added command-line
bgneal@4 128 arguments to the program to tell it where in the key space to start looking. I
bgneal@4 129 could then simply run multiple instances of my program. My development laptop,
bgneal@4 130 which runs Ubuntu, is from 2008, and it has 2 cores in it. My desktop PC (for
bgneal@4 131 gaming), has 4 cores! I next spent a couple nights installing MinGW_ and getting
bgneal@4 132 my application to build in that environment. I could now run 6 instances of my
bgneal@4 133 cracking program. Hopefully this would pay off and shorten the time further.
bgneal@4 134
bgneal@4 135 (I didn't even attempt to commandeer my wife's and kids' computers. That
bgneal@4 136 wouldn't have gone down so well.)
bgneal@4 137
bgneal@4 138 How do you know when you are done?
bgneal@4 139 ----------------------------------
bgneal@4 140
bgneal@4 141 The final problem that I had was determining when my cracking program had in
bgneal@4 142 fact cracked the message. The hill-climbing attempts produced scores for each
bgneal@4 143 key setting. Up to this point, based on running my program on messages 1 through
bgneal@4 144 9, I had noticed that when I got a score divided by the message length of around
bgneal@4 145 10 I had cracked the message. But my tests on messages 1 through 9 were
bgneal@4 146 hill-climbing using the known ring settings. My message cracker was using a
bgneal@4 147 shortcut: I was only searching ring settings on the "fast" rotor to save time.
bgneal@4 148 This would let me get the first part of a message, but if the middle or
bgneal@4 149 left-most rotor stepped at the wrong time the rest of the message would be
bgneal@4 150 unintelligible.
bgneal@4 151
bgneal@4 152 To verify my message cracking program, I ran it on some of the previous
bgneal@4 153 messages. I was shooting for a score divided by message length of 10. Using this
bgneal@4 154 heuristic, I was in fact able to crack some of the previous messages, but not
bgneal@4 155 others. It took me a while to realize that not searching the middle rotor's ring
bgneal@4 156 setting was causing this. The light bulb came on, and I realized that my
bgneal@4 157 heuristic of 10 is only valid when I search all the ring settings. Instead I
bgneal@4 158 should just keep track of the highest scores. When you get close enough, the
bgneal@4 159 score will be significantly higher than average score. Thus I may not know when
bgneal@4 160 I am done until I see a large spike in the score. I would then have to write
bgneal@4 161 another program to refine the solution to search for the middle and left-most
bgneal@4 162 ring setting.
bgneal@4 163
bgneal@4 164 Success at last
bgneal@4 165 ---------------
bgneal@4 166
bgneal@4 167 It took a little over a month of evenings and weekends to get to this point.
bgneal@4 168 I even had a false start where I ran my program on 6 cores for 1.5 days only to
bgneal@4 169 realize I had a bug in my code and I was only searching ring setting 0 on the
bgneal@4 170 fast ring. But once I had that worked out, I started a search on 6 cores on a
bgneal@4 171 Sunday night. I checked the logs off and on Monday, but no significant spike in
bgneal@4 172 the scores were noted. Before leaving for work on Tuesday I checked again, and
bgneal@4 173 noticed that one instance of my program had found a score twice as high as any
bgneal@4 174 other. Could that be it? I really wanted to stay and find out, but I had to go
bgneal@4 175 to work! Needless to say I was a bit distracted at work that day.
bgneal@4 176
bgneal@4 177 After work, I ran the settings through my Python application and my heart
bgneal@4 178 pounded as I recognized a few German words at the beginning of the message. In
bgneal@4 179 fact I had gotten around 75% of the message, then some garbage, but then the
bgneal@4 180 last few words were recognizable German. I then wrote another application to
bgneal@4 181 search the middle ring setting space. Using the highest score from that program
bgneal@4 182 allowed me to finally see the entire message. I cannot tell you how happy and
bgneal@4 183 relieved I was at the same time!
bgneal@4 184
bgneal@4 185 Final thoughts
bgneal@4 186 --------------
bgneal@4 187
bgneal@4 188 For a while I didn't think I was going to be smart enough or just plain up to
bgneal@4 189 the task of solving message 10. Looking back on it, I see a nice progression of
bgneal@4 190 trying things, experimenting, setbacks, and a few "a-ha" moments that lead to the
bgneal@4 191 breakthrough. I am very grateful to Dirk Rijmenants for creating the challenge,
bgneal@4 192 which was a lot of fun. And I also wish to thank Frode Weierud and Geoff
bgneal@4 193 Sullivan for their hill-climbing algorithm description. Thanks again to Stephan
bgneal@4 194 Krah for the key speedup that made my attempt possible.
bgneal@4 195
bgneal@4 196 For now I have decided not to publish my cracking program, since it is a tiny
bgneal@4 197 bit specific to Dirk's challenge. I don't want to ruin his Enigma Challenge by
bgneal@4 198 giving anything away. I don't believe my broad descriptions here have revealed
bgneal@4 199 too much. Other people need to experience the challenge for themselves to fully
bgneal@4 200 appreciate the thrill and hard work that go into code breaking.
bgneal@4 201
bgneal@4 202 I should also note that there are other ways of solving Dirk's challenge. You
bgneal@4 203 could, with a bit of patience, solve messages 1 through 9 with one of the many
bgneal@4 204 Enigma Machine GUI simulators (Dirk has a fine one). I'm not certain how
bgneal@4 205 you could break message 10 using a GUI simulator and trial and error, however.
bgneal@4 206
bgneal@4 207 It is my understanding that this style of Enigma message breaking was not the
bgneal@4 208 technique used by the allied code-breakers during the war. I believe they used
bgneal@4 209 knowledge of certain key words, or "cribs", in messages received in weather
bgneal@4 210 reports and other regular messages to find the key for the day. I look forward
bgneal@4 211 to reading more about their efforts now.
bgneal@4 212
bgneal@4 213 Finally, I have published the C++11 port of Py-Enigma_, which I am calling
bgneal@4 214 (rather unimaginatively) Cpp-Enigma_ on bitbucket. I hope Cpp-Enigma_ and
bgneal@4 215 Py-Enigma_ are useful for other people who want to explore the fascinating world
bgneal@4 216 of Enigma codes.
bgneal@4 217
bgneal@4 218 .. _Py-Enigma: https://bitbucket.org/bgneal/enigma
bgneal@4 219 .. _Python: http://www.python.org
bgneal@4 220 .. _Technical Details of the Enigma Machine: http://users.telenet.be/d.rijmenants/en/enigmatech.htm
bgneal@4 221 .. _Enigma Challenge: http://users.telenet.be/d.rijmenants/en/challenge.htm
bgneal@4 222 .. _Cpp-Enigma: https://bitbucket.org/bgneal/cpp-enigma
bgneal@4 223 .. _MinGW: http://www.mingw.org/