The Dorabella Cipher: Intro presented some basic information on the cipher and my introduction to it. This post will cover the initial analysis that I undertook, so that you essentially follow in my footsteps.
Here’s a reminder of what the note contained (we’ve never seen the reverse side but Dora states that it simply read “Miss Penny” so there’s unlikely to be any additional useful information there).
There are some initial observations that even those unskilled in these things (i.e., me) can make.
- Everything is composed of what appear to be semicircles (or the letter “C”). Some of them are badly written and are thus ambiguous.
- Some of the semicircles (eventually I called them “cusps”) are on their own while others are stacked together, so there are three basic types of symbol:
- Single cusp
- Double cusp
- Triple cusp
- Each of the three types can appear rotated in one of eight positions.
- Three types with eight positions yields an “alphabet” of 24 possible symbols.
The first thing that sprang to mind was that this number is not exactly the same as the number of letters in the English alphabet, but it’s close.
The second thing was that in the Roman alphabet some letters were used interchangeably. “I” and “J” could be represented by “I”, while “U” and “V” could be represented by “V”.
That gave a 24 letter alphabet, and maybe – just maybe – that was significant. Could these symbols represent a reduced alphabet?
Other things that came to mind were:
- Is there an association between the possible Italian connection (through the name Dorabella that Elgar is known to have used later as a nickname for Dora Penny) and the Roman origins of Italy (hence Latin)?
- There’s an odd mark on the bottom line – a dot – between the fifth and sixth symbol. To my mind it looks to have been deliberately placed. Could it act as a separator between different parts of the message, or different encryption methods?
The next step was to visualize the alphabet, but where to start? Since it’s anyone’s guess, I opted to divide the alphabet into three sets of eight, starting with the single cusp (eight rotations), then the double, and finally the triple.
I created a simple table of symbols using Microsoft Word’s Draw facility. I opted to start from a position where the very first symbol would look like a “C” since this letter has some significance in music:
Then I added my reduced alphabet:
So far, so good. The next step was to analyse the distribution of the symbols as used in the note itself. Did it make use of all 24 symbols?
Clearly not. There are at least four symbols (their corresponding letters underscored and marked in red) that don’t appear in the note itself but which are part of the set of possible symbols. This wouldn’t be entirely unexpected, because of the length (or rather, lack of it) of the message – it uses only 87 symbols or letters, which is going to make it difficult to crack using approaches that rely on the frequency of occurrence of letters in text.
At this point I came across an article in New Scientist by Professor Kevin Jones entitled “Breaking Elgar’s Enigmatic Code”. It’s available through other sites on the ‘Net, although the graphics are usually omitted, and those are actually important, since one of them lists Professor Jones’ interpretation of the symbols as letters using a reduced alphabet.
Professor Jones’ interpretation is this:
Using this information it’s possible to work backwards and derive the table of letters corresponding to symbols that Professor Jones used – and it was different from mine. (As it turns out, that’s not as important as you might think.)
Note that the interpretation is gibberish – no magical revelation here. That’s not surprising – this message has remained undeciphered since 1897, so if it was going to be that easy, I wouldn’t be writing this post…
The gibberish suggests that Elgar encrypted his original message first, using a method that has yet to be determined, and then converted it to a symbolic form before writing it down. He may have performed other processes too, as we shall come to see.
At around the same time I came across another, much more crucial, piece of information. It was a link to an article by Eric Sams, someone who has played a major role in the attempts to crack the cipher (check with the prior blog post for more information).
The reason this article was so significant is that at the very end, in an Appendix (d), Eric reproduced a page (or possibly two facing pages – it’s difficult to tell) from an exercise book owned by Elgar, in which he had written down not just the full symbol set, but assigned an alphabet using the reduced set, and even written a few test messages. There are other significant items on those pages, and we’ll come to those later.
The exercise book dates from the 1920s, years after the cipher was used in the note to Dora, but the key thing is that it shows at least one version of the mapping of symbols to letters, and the use of the reduced alphabet.
It meant I had been on the right track. It also confirmed Professor Jones’ mapping of letters to symbols (I’m assuming that he had seen the exercise book pages). Here is the re-arrangement of the table, based on the new information:
The symbols that do not appear in the note are now mapped to M, N, O and Z.
It was becoming clear that in order to work effectively with the symbols it was necessary to have a font rather than a set of crude drawings, and so I created a very rough and ready TrueType font and made it freely available.
The use of the font enables other things too. It’s possible to use the symbols in a spreadsheet and perform analyses on the distribution of the letters the symbols represent.
I also subscribed to a Yahoo! group dedicated to research into the cipher. I am presently co-owner and co-moderator of the group, which numbers some 220+ [Update: 430+] individuals from all walks of life and spread through different parts of the world.
We all have our own ideas about what the cipher might represent and how it might be decoded – if it was ever intended to be decoded – and, apart from a few attempts that are less than convincing, no-one has yet come up with a definitive answer that is acceptable to the majority.
Later I developed a tool based on Microsoft Excel, written in VBA, that makes a number of things possible, including experimenting with pseudorandom maps of symbols to letters.
That tool is freely available, in the same location as the TTF file. Some preliminary documentation accompanies the Excel file. (I had intended to create a version based on OpenOffice but I would have to find another way to visually present certain functions, since those don’t have counterparts to the features available in Excel’s VBA, and so far I have not had time.)
The next step was to undertake a deeper analysis by looking at the frequency of occurrence of each symbol in the message, and compare that with known frequencies of occurrence of letters of the English alphabet in various texts – the subject of the next post.
I also encountered the work of William Friedman and in particular his Index of Coincidence, which, when used with the understanding that there is virtually no statistical confidence in the results (!) can help guide the choice of approaches to decryption. We’ll look at that in the next post too.