comparison content/Coding/019-enigma-challenge.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.
author Brian Neal <bgneal@gmail.com>
date Thu, 30 Jan 2014 21:45:03 -0600
parents
children 6e0d4799796d
comparison
equal deleted inserted replaced
3:c3115da3ff73 4:7ce6393e6d30
1 Completing the Enigma Challenge
2 ###############################
3
4 :date: 2012-07-22 13:30
5 :tags: Enigma, Py-Enigma, Cpp-Enigma, C++
6 :slug: completing-the-enigma-challenge
7 :author: Brian Neal
8
9 Since my last `blog post
10 <http://deathofagremmie.com/2012/06/06/introducing-py-enigma/>`_, I have spent
11 almost all my free time working on `Dirk Rijmenants' Enigma Challenge
12 <http://users.telenet.be/d.rijmenants/en/challenge.htm>`_. I'm very happy to
13 report that I cracked the last message on the evening of July 10. I thought it
14 might be interesting to summarize my experiences working on this challenge in a
15 blog post.
16
17 The first nine
18 --------------
19
20 I had a lot of fun building Py-Enigma_, an Enigma machine simulator library
21 written in Python_ based on the information I found on Dirk's `Technical Details
22 of the Enigma Machine`_ page. After I had built it and played with it for a
23 while, I discovered Dirk also had an `Enigma Challenge`_. I decided this would
24 be a good test for Py-Enigma_, and it may give me some ideas on how to improve
25 the library for breaking messages.
26
27 The first nine messages were pretty easy to solve using Py-Enigma_. Dirk gives
28 you most of the key information for each problem, and one can then write a small
29 program to brute force the missing information. I was able to solve the first
30 nine messages in a weekend. This was very enjoyable, as it did require some
31 thinking about how to simulate the problem and then executing it. I also learned
32 about `Python's itertools <http://docs.python.org/library/itertools.html>`_
33 library, which contains some very handy functions for generating permutations
34 and combinations for brute forcing a solution.
35
36 Dirk did a great job on the messages. They referenced actual events from World
37 War II and I ended up spending a fair amount of time on Wikipedia reading about
38 them. Translating the messages from German to English was a bit difficult, but I
39 relied heavily on `Google Translate <http://translate.google.com>`_.
40
41 Message 10
42 ----------
43
44 But then I came to the last message. Here, Dirk doesn't give you any key
45 information, just ciphertext. Gulp. If you include 2 possible reflectors, the
46 Enigma machine's key space for message 10 is somewhere around 2 x 10\
47 :sup:`23` keys, so it is impossible to brute force every possible key, unless
48 you can afford to wait around for the heat death of the universe.
49
50 I then did some research, and learned about a technique for brute-forcing only
51 the rotor and message settings, and then performing a heuristic technique called
52 "hill-climbing" on the plugboard settings. I am deeply indepted to Frode Weierud
53 and Geoff Sullivan, who have published high level descriptions of this
54 technique. See Frode's `The Enigma Collection
55 <http://cryptocellar.org/Enigma/>`_ page and Geoff's `Crypto Barn
56 <http://www.hut-six.co.uk/>`_ for more information.
57
58 Python is too slow, or is it?
59 -----------------------------
60
61 After a week or so of studying and tinkering, I came up with a hill-climbing
62 algorithm in Python. I then started an attempt to break message 10, but after
63 running the program for a little over 24 hours, I realized it was too slow. I
64 did a "back of the envelope" calculation and determined I would need several
65 hundred years to hill-climb every rotor and message setting. This was a bit
66 disappointing, but I knew it was a possibility going in.
67
68 I then set out, somewhat reluctantly, to port Py-Enigma to C++. These days I
69 don't do any C++ at home, only at work. However I got much happier when I
70 decided this was an opportunity to try out some new C++11 features (the embedded
71 compiler we use at work has no support for C++11). Okay, things are fun again! I
72 have to say that C++11 is really quite enjoyable, and I made good use of the new
73 ``auto`` style variable declarations, range-based for-loops, and brace-style
74 initialization for containers like ``vector`` and ``map``.
75
76 C++ is too slow, or is it?
77 --------------------------
78
79 It took a fair amount of time to re-tool my code to C++11. I ensured all my unit
80 tests originally written in Python could pass in C++, and that I could solve
81 some of the previous messages. I then re-implemented my hill-climbing
82 algorithm in C++ and made another attempt at cracking message 10.
83
84 My C++ solution was indeed faster, but only by a factor of 8 or 9. I calculated
85 I would need about 35 years to hill-climb all rotor and message settings, and I
86 didn't really want to see if I could out-live my program. My morale was very low
87 at this point. I wasn't sure if I could ever solve this self-imposed problem.
88
89 Algorithms & Data Structures are key
90 ------------------------------------
91
92 I had wanted to test myself with this problem and had avoided looking at anyone
93 else's source code. However, at this juncture, I needed some help. I turned to
94 Stephan Krah's `M4 Message Breaking Project
95 <http://www.bytereef.org/m4_project.html>`_. Stephan had started a distributed
96 computing attack on several 4-rotor naval Enigma machine messages. He had
97 graciously made his source code available. Perhaps I could get some hints by
98 looking at his code.
99
100 Indeed, Stephen's code provided the break I needed. I discovered a very cool
101 trick that Stephen was doing right before he started his hill-climb. He
102 pre-computed every path through the machine for every letter of the alphabet.
103 Since hill-climbing involves running the message through the machine roughly
104 600 to 1000 times, this proved to be a huge time saver. After borrowing this
105 idea, my hill-climbing times for one fixed rotor & message setting went down
106 from 250 milliseconds to 2!
107
108 I immediately stopped looking at Stephen's code at this point; I didn't bother
109 to compare our hill-climbing algorithms. I believe both of us are influenced by
110 Weierud & Sullivan here. I was now convinced I had the speed-up I needed to make
111 a serious attempt at message 10 again.
112
113 This showed me how critically important it is to have the right data structure
114 and algorithm for a problem. However there were two other lessons I needed to
115 learn before I was successful.
116
117 Utilize all your computing resources
118 ------------------------------------
119
120 Up to this point my cracking program was linear. It started at the first rotor
121 and message setting, then hill-climbed and noted the score I got. Then it
122 proceeded to the next message setting and hill-climbed, noting the score of that
123 attempt, and so on. There were two problems with this approach.
124
125 I realized if I could do some sort of multi-processing, I could search the key
126 space faster. I thought about forking off multiple processes or maybe using
127 threads, but that was over-thinking it. In the end, I added command-line
128 arguments to the program to tell it where in the key space to start looking. I
129 could then simply run multiple instances of my program. My development laptop,
130 which runs Ubuntu, is from 2008, and it has 2 cores in it. My desktop PC (for
131 gaming), has 4 cores! I next spent a couple nights installing MinGW_ and getting
132 my application to build in that environment. I could now run 6 instances of my
133 cracking program. Hopefully this would pay off and shorten the time further.
134
135 (I didn't even attempt to commandeer my wife's and kids' computers. That
136 wouldn't have gone down so well.)
137
138 How do you know when you are done?
139 ----------------------------------
140
141 The final problem that I had was determining when my cracking program had in
142 fact cracked the message. The hill-climbing attempts produced scores for each
143 key setting. Up to this point, based on running my program on messages 1 through
144 9, I had noticed that when I got a score divided by the message length of around
145 10 I had cracked the message. But my tests on messages 1 through 9 were
146 hill-climbing using the known ring settings. My message cracker was using a
147 shortcut: I was only searching ring settings on the "fast" rotor to save time.
148 This would let me get the first part of a message, but if the middle or
149 left-most rotor stepped at the wrong time the rest of the message would be
150 unintelligible.
151
152 To verify my message cracking program, I ran it on some of the previous
153 messages. I was shooting for a score divided by message length of 10. Using this
154 heuristic, I was in fact able to crack some of the previous messages, but not
155 others. It took me a while to realize that not searching the middle rotor's ring
156 setting was causing this. The light bulb came on, and I realized that my
157 heuristic of 10 is only valid when I search all the ring settings. Instead I
158 should just keep track of the highest scores. When you get close enough, the
159 score will be significantly higher than average score. Thus I may not know when
160 I am done until I see a large spike in the score. I would then have to write
161 another program to refine the solution to search for the middle and left-most
162 ring setting.
163
164 Success at last
165 ---------------
166
167 It took a little over a month of evenings and weekends to get to this point.
168 I even had a false start where I ran my program on 6 cores for 1.5 days only to
169 realize I had a bug in my code and I was only searching ring setting 0 on the
170 fast ring. But once I had that worked out, I started a search on 6 cores on a
171 Sunday night. I checked the logs off and on Monday, but no significant spike in
172 the scores were noted. Before leaving for work on Tuesday I checked again, and
173 noticed that one instance of my program had found a score twice as high as any
174 other. Could that be it? I really wanted to stay and find out, but I had to go
175 to work! Needless to say I was a bit distracted at work that day.
176
177 After work, I ran the settings through my Python application and my heart
178 pounded as I recognized a few German words at the beginning of the message. In
179 fact I had gotten around 75% of the message, then some garbage, but then the
180 last few words were recognizable German. I then wrote another application to
181 search the middle ring setting space. Using the highest score from that program
182 allowed me to finally see the entire message. I cannot tell you how happy and
183 relieved I was at the same time!
184
185 Final thoughts
186 --------------
187
188 For a while I didn't think I was going to be smart enough or just plain up to
189 the task of solving message 10. Looking back on it, I see a nice progression of
190 trying things, experimenting, setbacks, and a few "a-ha" moments that lead to the
191 breakthrough. I am very grateful to Dirk Rijmenants for creating the challenge,
192 which was a lot of fun. And I also wish to thank Frode Weierud and Geoff
193 Sullivan for their hill-climbing algorithm description. Thanks again to Stephan
194 Krah for the key speedup that made my attempt possible.
195
196 For now I have decided not to publish my cracking program, since it is a tiny
197 bit specific to Dirk's challenge. I don't want to ruin his Enigma Challenge by
198 giving anything away. I don't believe my broad descriptions here have revealed
199 too much. Other people need to experience the challenge for themselves to fully
200 appreciate the thrill and hard work that go into code breaking.
201
202 I should also note that there are other ways of solving Dirk's challenge. You
203 could, with a bit of patience, solve messages 1 through 9 with one of the many
204 Enigma Machine GUI simulators (Dirk has a fine one). I'm not certain how
205 you could break message 10 using a GUI simulator and trial and error, however.
206
207 It is my understanding that this style of Enigma message breaking was not the
208 technique used by the allied code-breakers during the war. I believe they used
209 knowledge of certain key words, or "cribs", in messages received in weather
210 reports and other regular messages to find the key for the day. I look forward
211 to reading more about their efforts now.
212
213 Finally, I have published the C++11 port of Py-Enigma_, which I am calling
214 (rather unimaginatively) Cpp-Enigma_ on bitbucket. I hope Cpp-Enigma_ and
215 Py-Enigma_ are useful for other people who want to explore the fascinating world
216 of Enigma codes.
217
218 .. _Py-Enigma: https://bitbucket.org/bgneal/enigma
219 .. _Python: http://www.python.org
220 .. _Technical Details of the Enigma Machine: http://users.telenet.be/d.rijmenants/en/enigmatech.htm
221 .. _Enigma Challenge: http://users.telenet.be/d.rijmenants/en/challenge.htm
222 .. _Cpp-Enigma: https://bitbucket.org/bgneal/cpp-enigma
223 .. _MinGW: http://www.mingw.org/