pelican-blog: content/Coding/019-enigma-challenge.rst annotate

annotate content/Coding/019-enigma-challenge.rst @ 4:7ce6393e6d30

Adding converted blog posts from old blog.

author	Brian Neal <bgneal@gmail.com>
date	Thu, 30 Jan 2014 21:45:03 -0600
parents
children	6e0d4799796d

rev	line source
bgneal@4	1 Completing the Enigma Challenge
bgneal@4	2 ###############################
bgneal@4	3
bgneal@4	4 :date: 2012-07-22 13:30
bgneal@4	5 :tags: Enigma, Py-Enigma, Cpp-Enigma, C++
bgneal@4	6 :slug: completing-the-enigma-challenge
bgneal@4	7 :author: Brian Neal
bgneal@4	8
bgneal@4	9 Since my last `blog post
bgneal@4	10 <http://deathofagremmie.com/2012/06/06/introducing-py-enigma/>`_, I have spent
bgneal@4	11 almost all my free time working on `Dirk Rijmenants' Enigma Challenge
bgneal@4	12 <http://users.telenet.be/d.rijmenants/en/challenge.htm>`_. I'm very happy to
bgneal@4	13 report that I cracked the last message on the evening of July 10. I thought it
bgneal@4	14 might be interesting to summarize my experiences working on this challenge in a
bgneal@4	15 blog post.
bgneal@4	16
bgneal@4	17 The first nine
bgneal@4	18 --------------
bgneal@4	19
bgneal@4	20 I had a lot of fun building Py-Enigma_, an Enigma machine simulator library
bgneal@4	21 written in Python_ based on the information I found on Dirk's `Technical Details
bgneal@4	22 of the Enigma Machine`_ page. After I had built it and played with it for a
bgneal@4	23 while, I discovered Dirk also had an `Enigma Challenge`_. I decided this would
bgneal@4	24 be a good test for Py-Enigma_, and it may give me some ideas on how to improve
bgneal@4	25 the library for breaking messages.
bgneal@4	26
bgneal@4	27 The first nine messages were pretty easy to solve using Py-Enigma_. Dirk gives
bgneal@4	28 you most of the key information for each problem, and one can then write a small
bgneal@4	29 program to brute force the missing information. I was able to solve the first
bgneal@4	30 nine messages in a weekend. This was very enjoyable, as it did require some
bgneal@4	31 thinking about how to simulate the problem and then executing it. I also learned
bgneal@4	32 about `Python's itertools <http://docs.python.org/library/itertools.html>`_
bgneal@4	33 library, which contains some very handy functions for generating permutations
bgneal@4	34 and combinations for brute forcing a solution.
bgneal@4	35
bgneal@4	36 Dirk did a great job on the messages. They referenced actual events from World
bgneal@4	37 War II and I ended up spending a fair amount of time on Wikipedia reading about
bgneal@4	38 them. Translating the messages from German to English was a bit difficult, but I
bgneal@4	39 relied heavily on `Google Translate <http://translate.google.com>`_.
bgneal@4	40
bgneal@4	41 Message 10
bgneal@4	42 ----------
bgneal@4	43
bgneal@4	44 But then I came to the last message. Here, Dirk doesn't give you any key
bgneal@4	45 information, just ciphertext. Gulp. If you include 2 possible reflectors, the
bgneal@4	46 Enigma machine's key space for message 10 is somewhere around 2 x 10\
bgneal@4	47 :sup:`23` keys, so it is impossible to brute force every possible key, unless
bgneal@4	48 you can afford to wait around for the heat death of the universe.
bgneal@4	49
bgneal@4	50 I then did some research, and learned about a technique for brute-forcing only
bgneal@4	51 the rotor and message settings, and then performing a heuristic technique called
bgneal@4	52 "hill-climbing" on the plugboard settings. I am deeply indepted to Frode Weierud
bgneal@4	53 and Geoff Sullivan, who have published high level descriptions of this
bgneal@4	54 technique. See Frode's `The Enigma Collection
bgneal@4	55 <http://cryptocellar.org/Enigma/>`_ page and Geoff's `Crypto Barn
bgneal@4	56 <http://www.hut-six.co.uk/>`_ for more information.
bgneal@4	57
bgneal@4	58 Python is too slow, or is it?
bgneal@4	59 -----------------------------
bgneal@4	60
bgneal@4	61 After a week or so of studying and tinkering, I came up with a hill-climbing
bgneal@4	62 algorithm in Python. I then started an attempt to break message 10, but after
bgneal@4	63 running the program for a little over 24 hours, I realized it was too slow. I
bgneal@4	64 did a "back of the envelope" calculation and determined I would need several
bgneal@4	65 hundred years to hill-climb every rotor and message setting. This was a bit
bgneal@4	66 disappointing, but I knew it was a possibility going in.
bgneal@4	67
bgneal@4	68 I then set out, somewhat reluctantly, to port Py-Enigma to C++. These days I
bgneal@4	69 don't do any C++ at home, only at work. However I got much happier when I
bgneal@4	70 decided this was an opportunity to try out some new C++11 features (the embedded
bgneal@4	71 compiler we use at work has no support for C++11). Okay, things are fun again! I
bgneal@4	72 have to say that C++11 is really quite enjoyable, and I made good use of the new
bgneal@4	73 ``auto`` style variable declarations, range-based for-loops, and brace-style
bgneal@4	74 initialization for containers like ``vector`` and ``map``.
bgneal@4	75
bgneal@4	76 C++ is too slow, or is it?
bgneal@4	77 --------------------------
bgneal@4	78
bgneal@4	79 It took a fair amount of time to re-tool my code to C++11. I ensured all my unit
bgneal@4	80 tests originally written in Python could pass in C++, and that I could solve
bgneal@4	81 some of the previous messages. I then re-implemented my hill-climbing
bgneal@4	82 algorithm in C++ and made another attempt at cracking message 10.
bgneal@4	83
bgneal@4	84 My C++ solution was indeed faster, but only by a factor of 8 or 9. I calculated
bgneal@4	85 I would need about 35 years to hill-climb all rotor and message settings, and I
bgneal@4	86 didn't really want to see if I could out-live my program. My morale was very low
bgneal@4	87 at this point. I wasn't sure if I could ever solve this self-imposed problem.
bgneal@4	88
bgneal@4	89 Algorithms & Data Structures are key
bgneal@4	90 ------------------------------------
bgneal@4	91
bgneal@4	92 I had wanted to test myself with this problem and had avoided looking at anyone
bgneal@4	93 else's source code. However, at this juncture, I needed some help. I turned to
bgneal@4	94 Stephan Krah's `M4 Message Breaking Project
bgneal@4	95 <http://www.bytereef.org/m4_project.html>`_. Stephan had started a distributed
bgneal@4	96 computing attack on several 4-rotor naval Enigma machine messages. He had
bgneal@4	97 graciously made his source code available. Perhaps I could get some hints by
bgneal@4	98 looking at his code.
bgneal@4	99
bgneal@4	100 Indeed, Stephen's code provided the break I needed. I discovered a very cool
bgneal@4	101 trick that Stephen was doing right before he started his hill-climb. He
bgneal@4	102 pre-computed every path through the machine for every letter of the alphabet.
bgneal@4	103 Since hill-climbing involves running the message through the machine roughly
bgneal@4	104 600 to 1000 times, this proved to be a huge time saver. After borrowing this
bgneal@4	105 idea, my hill-climbing times for one fixed rotor & message setting went down
bgneal@4	106 from 250 milliseconds to 2!
bgneal@4	107
bgneal@4	108 I immediately stopped looking at Stephen's code at this point; I didn't bother
bgneal@4	109 to compare our hill-climbing algorithms. I believe both of us are influenced by
bgneal@4	110 Weierud & Sullivan here. I was now convinced I had the speed-up I needed to make
bgneal@4	111 a serious attempt at message 10 again.
bgneal@4	112
bgneal@4	113 This showed me how critically important it is to have the right data structure
bgneal@4	114 and algorithm for a problem. However there were two other lessons I needed to
bgneal@4	115 learn before I was successful.
bgneal@4	116
bgneal@4	117 Utilize all your computing resources
bgneal@4	118 ------------------------------------
bgneal@4	119
bgneal@4	120 Up to this point my cracking program was linear. It started at the first rotor
bgneal@4	121 and message setting, then hill-climbed and noted the score I got. Then it
bgneal@4	122 proceeded to the next message setting and hill-climbed, noting the score of that
bgneal@4	123 attempt, and so on. There were two problems with this approach.
bgneal@4	124
bgneal@4	125 I realized if I could do some sort of multi-processing, I could search the key
bgneal@4	126 space faster. I thought about forking off multiple processes or maybe using
bgneal@4	127 threads, but that was over-thinking it. In the end, I added command-line
bgneal@4	128 arguments to the program to tell it where in the key space to start looking. I
bgneal@4	129 could then simply run multiple instances of my program. My development laptop,
bgneal@4	130 which runs Ubuntu, is from 2008, and it has 2 cores in it. My desktop PC (for
bgneal@4	131 gaming), has 4 cores! I next spent a couple nights installing MinGW_ and getting
bgneal@4	132 my application to build in that environment. I could now run 6 instances of my
bgneal@4	133 cracking program. Hopefully this would pay off and shorten the time further.
bgneal@4	134
bgneal@4	135 (I didn't even attempt to commandeer my wife's and kids' computers. That
bgneal@4	136 wouldn't have gone down so well.)
bgneal@4	137
bgneal@4	138 How do you know when you are done?
bgneal@4	139 ----------------------------------
bgneal@4	140
bgneal@4	141 The final problem that I had was determining when my cracking program had in
bgneal@4	142 fact cracked the message. The hill-climbing attempts produced scores for each
bgneal@4	143 key setting. Up to this point, based on running my program on messages 1 through
bgneal@4	144 9, I had noticed that when I got a score divided by the message length of around
bgneal@4	145 10 I had cracked the message. But my tests on messages 1 through 9 were
bgneal@4	146 hill-climbing using the known ring settings. My message cracker was using a
bgneal@4	147 shortcut: I was only searching ring settings on the "fast" rotor to save time.
bgneal@4	148 This would let me get the first part of a message, but if the middle or
bgneal@4	149 left-most rotor stepped at the wrong time the rest of the message would be
bgneal@4	150 unintelligible.
bgneal@4	151
bgneal@4	152 To verify my message cracking program, I ran it on some of the previous
bgneal@4	153 messages. I was shooting for a score divided by message length of 10. Using this
bgneal@4	154 heuristic, I was in fact able to crack some of the previous messages, but not
bgneal@4	155 others. It took me a while to realize that not searching the middle rotor's ring
bgneal@4	156 setting was causing this. The light bulb came on, and I realized that my
bgneal@4	157 heuristic of 10 is only valid when I search all the ring settings. Instead I
bgneal@4	158 should just keep track of the highest scores. When you get close enough, the
bgneal@4	159 score will be significantly higher than average score. Thus I may not know when
bgneal@4	160 I am done until I see a large spike in the score. I would then have to write
bgneal@4	161 another program to refine the solution to search for the middle and left-most
bgneal@4	162 ring setting.
bgneal@4	163
bgneal@4	164 Success at last
bgneal@4	165 ---------------
bgneal@4	166
bgneal@4	167 It took a little over a month of evenings and weekends to get to this point.
bgneal@4	168 I even had a false start where I ran my program on 6 cores for 1.5 days only to
bgneal@4	169 realize I had a bug in my code and I was only searching ring setting 0 on the
bgneal@4	170 fast ring. But once I had that worked out, I started a search on 6 cores on a
bgneal@4	171 Sunday night. I checked the logs off and on Monday, but no significant spike in
bgneal@4	172 the scores were noted. Before leaving for work on Tuesday I checked again, and
bgneal@4	173 noticed that one instance of my program had found a score twice as high as any
bgneal@4	174 other. Could that be it? I really wanted to stay and find out, but I had to go
bgneal@4	175 to work! Needless to say I was a bit distracted at work that day.
bgneal@4	176
bgneal@4	177 After work, I ran the settings through my Python application and my heart
bgneal@4	178 pounded as I recognized a few German words at the beginning of the message. In
bgneal@4	179 fact I had gotten around 75% of the message, then some garbage, but then the
bgneal@4	180 last few words were recognizable German. I then wrote another application to
bgneal@4	181 search the middle ring setting space. Using the highest score from that program
bgneal@4	182 allowed me to finally see the entire message. I cannot tell you how happy and
bgneal@4	183 relieved I was at the same time!
bgneal@4	184
bgneal@4	185 Final thoughts
bgneal@4	186 --------------
bgneal@4	187
bgneal@4	188 For a while I didn't think I was going to be smart enough or just plain up to
bgneal@4	189 the task of solving message 10. Looking back on it, I see a nice progression of
bgneal@4	190 trying things, experimenting, setbacks, and a few "a-ha" moments that lead to the
bgneal@4	191 breakthrough. I am very grateful to Dirk Rijmenants for creating the challenge,
bgneal@4	192 which was a lot of fun. And I also wish to thank Frode Weierud and Geoff
bgneal@4	193 Sullivan for their hill-climbing algorithm description. Thanks again to Stephan
bgneal@4	194 Krah for the key speedup that made my attempt possible.
bgneal@4	195
bgneal@4	196 For now I have decided not to publish my cracking program, since it is a tiny
bgneal@4	197 bit specific to Dirk's challenge. I don't want to ruin his Enigma Challenge by
bgneal@4	198 giving anything away. I don't believe my broad descriptions here have revealed
bgneal@4	199 too much. Other people need to experience the challenge for themselves to fully
bgneal@4	200 appreciate the thrill and hard work that go into code breaking.
bgneal@4	201
bgneal@4	202 I should also note that there are other ways of solving Dirk's challenge. You
bgneal@4	203 could, with a bit of patience, solve messages 1 through 9 with one of the many
bgneal@4	204 Enigma Machine GUI simulators (Dirk has a fine one). I'm not certain how
bgneal@4	205 you could break message 10 using a GUI simulator and trial and error, however.
bgneal@4	206
bgneal@4	207 It is my understanding that this style of Enigma message breaking was not the
bgneal@4	208 technique used by the allied code-breakers during the war. I believe they used
bgneal@4	209 knowledge of certain key words, or "cribs", in messages received in weather
bgneal@4	210 reports and other regular messages to find the key for the day. I look forward
bgneal@4	211 to reading more about their efforts now.
bgneal@4	212
bgneal@4	213 Finally, I have published the C++11 port of Py-Enigma_, which I am calling
bgneal@4	214 (rather unimaginatively) Cpp-Enigma_ on bitbucket. I hope Cpp-Enigma_ and
bgneal@4	215 Py-Enigma_ are useful for other people who want to explore the fascinating world
bgneal@4	216 of Enigma codes.
bgneal@4	217
bgneal@4	218 .. _Py-Enigma: https://bitbucket.org/bgneal/enigma
bgneal@4	219 .. _Python: http://www.python.org
bgneal@4	220 .. _Technical Details of the Enigma Machine: http://users.telenet.be/d.rijmenants/en/enigmatech.htm
bgneal@4	221 .. _Enigma Challenge: http://users.telenet.be/d.rijmenants/en/challenge.htm
bgneal@4	222 .. _Cpp-Enigma: https://bitbucket.org/bgneal/cpp-enigma
bgneal@4	223 .. _MinGW: http://www.mingw.org/

Mercurial > public > pelican-blog

annotate content/Coding/019-enigma-challenge.rst @ 4:7ce6393e6d30