Saturday, 12 February 2011

Vinegar - refined Vigenère - can you break my cipher?

I'm idly interested in cryptography, the art of scrambling a message so that it can be transmitted securely, and only someone with the magic key can understand the message.

When I was young, I designed a cryptographic algorithm. I thought I was so clever, but just because *I* couldn't break it, that doesn't make it secure.

In this article, I present my naive cryptographic algorithm. It's very flawed, so please don't use it for anything important. Can you find the flaw?

This article will start with some background on substitution ciphers and the Vigenère cipher, which my method was based upon. Then, we'll look at my big idea itself, Vinegar. To keep it interesting, there's a little code breaking challenge as well. Enjoy!


How Etaoin Shrdlu defeated substitution ciphers.

Like most children, my first encounter with cryptography was a substitution cipher. A friend gave me a sheet of paper with each of the 26 letters and wacky squiggle next to each one. This would be our secret code. Replace each letter with it's squiggle and it would just look like a bunch of squiggles.

We thought it was unbreakable, but it wasn't. This sort of code can be cracked by knowing that E is the most common letter in English, so the most common squiggle in the hidden message is probably an E. The next most common squiggle is probably a T. Once you've covered the twelve most common letters in English; E, T, A, O, I, N, S, H, R, D, L and U;
♦ou ♦an easil♦ ♦or♦ out ♦hat the other ♦issin♦ letters ♦ould ♦e.

What we need is a cipher the produces a coded message with an equal mixture of symbols. Enter Vigenère.


The Vigenère cipher

Centuries ago, the choice of people who wanted to communicate in private was Vigenère. Here's how it works.

          A 0
      Z 25    B 1  
    Y 24        C 2
   X 23          D 3
  W 22            E 4 
  V 21            F 5  
 U 20              G 6 
 T 19              H 7 
  S 18            I 8
  R 17            J 9 
   Q 16          K 10
    P 15        L 11 
      O 14    M 12
          N 13 
(This was meant to be circular.)
The key to understanding a lot of cryptography is that the 26 letters of the English language can be used as numbers. On this chart, each letter has been given a number. Now, it's possible to do simple calculations with letters. What's C+C? The answer is E, because 2+2=4.

Don't get excited, but you can do subtraction as well. K-H is D, because 10-7=3.

You may be wondering what should happen if you add past 'Z' or subtract past 'A'. For our purposes, imagine the 26 letters on a clock face in a circle. On a normal clock, when a hand ticks past the 12, it moves onto 1, not 13. Its the same with the Vigenère clock of letters. After the letter 'Z', is the letter 'A'.

Finally, because we've constrained our system to these 26 values, adding 'B' (1) turns out to be the same as subtracting 'Z' (25). Regardless of where you start, performing either +B or -Z will end at the same letter. In fact, all of the 26 possible additions will have an equivalent subtraction. You can find each letter's pair on the chart by looking for the letter across from the other. 'A' and 'N' are self pairing; +A is the same as -A, and +N is the same as -N.

Working within this system, we can use this to encrypt secret messages. Imagine Bob and Carol wish to communicate in private, but the bad guys can read their messages. To stop the bad guys, Bob and Carol meet up in advance and agreed the code they will use future.

Later, Bob wants to send the message "But now you will die by chainsaw" to Carol. (He's a fan of Internet cartoons.) Now we have the system of adding letters together, Bob can perform a simple calculation on each letter of the secret plain text message. He takes the previously agreed keyword, "WILHELM", and adds each letter of the keyword to each letter of the plain text, repeating the keyword as often as needed;

  BUT NOW YOU WILL DIE BY CHAINSAW
+ WIL HEL MWI LHEL MWI LH ELMWILHE
  XCE USH KKC HPPW PEM MF GSMEVDHA

When Carol receives the encoded message, she can get back the plain-text by subtracting the keyword.

  XCE USH KKC HPPW PEM MF GSMEVDHA
- WIL HEL MWI LHEL MWI LH ELMWILHE
  BUT NOW YOU WILL DIE BY CHAINSAW

Vigenère fixes the flaw with substitution ciphers because all the letter Es in the original message will all (mostly) come out as different letters.

Vigenère eventually fell out of use once a new flaw was discovered. Frequency analysis was still hiding there. If you know that the length of the keyword is 7, then you know that every 7th letter was encoded with the same letter from the keyword. So if you circle the 1st, 8th, 15th, etc letter of a long enough hidden message, the most frequent letter of those circled letters is probably an 'E', etc. Repeat for each group of letters 7 spaces apart and you can work out what the plain-text message was.

(How did we know the keyword was 7 letters long? There are ways. Wikipedia have an in-depth description but you don't need to know how for this puzzle.) If you want to experiment, "Sharky" has a rather splendid web-app to perform the encoding and decoding.

Vinegar - Refined Vigenère
This is my improvement to the Vigenère cipher, which I called Vinegar. (Because that's close to how I kept mispronouncing it.)

The problem with Vigenère is that the keyword is repeated, and that repetition exposed the vulnerability. What we need is a keyword that's long without repeating, but small enough to be remembered.

Vinegar takes a 17 letter keyword and expands it to 210 letters. With that long 210 letter keyword, you can use Vigenère without having to repeat the keyword, and it's the repetition in Vigenère that exposes it's vulnerability.

Why 17? Because it's the sum of the first four prime numbers. 2+3+5+7=17. We'll use the keyword "WILHELMVONHACKENS" for this example.

Split the keyword up into groups of 2, 3, 5 and 7 letters. WI, LHE, LMVON and HACKENS.

Repeat each sub-key to make up 210 letters:

  WIWIWIWIWIWIWIWIWIWIWIWIWI...
  LHELHELHELHELHELHELHELHELH...
  LMVONLMVONLMVONLMVONLMVONL...
+ HACKENSHACKENSHACKENSHACKE...

Because each group has a prime number length, the four groups will effectively expand into a 210 letter Vigenère keyword for the price of a 17 letter keyword. (210 is 2x3x5x7.) Add each column together to get the long key...

ZBXRUKLROI YCPVUERRZP DMYCEEZGYZ ULZUIETMYK BQJDPOTUNJ LHIEHSTOTJ WONOQZDOBY VYENRRHOVE VJLSBAOYVM KIVJABGCVG QIGQFLPJFG YXFAWKQBJG SDFLDPAKQQ SLUKNGZLIU SFAKYNEVRB CFIZXXVUST GFCFXICZCC NPCNMHMQBD FTCHSHXXGN OAMHAWWHXM PSQHUCWSER

To avoid the Vigenère vulnerability, we can only use a 17 letter keyword once per 210 letters. So if you are encoding a longer message, use the last 17 letters of each 210 block for a new keyword to use for the next block of 210 characters.

So there's my cipher. It has a flaw. Can you work it out? To try your hand, I came up with a random 17 letter codeword and expanded it to 210 letters. I then used that long key to encrypt my secret message. (My message is a little shorter than 210 letters, so I left some at the end un-used.) The plain-text is perfectly normal English. Spaces and punctuation are retained from the plain text.

Iwy seix zfvdzykjxm moebj dkaavmin vjkehleozp atir sdvwkvm cf hhbd vw gauj ty qzintte av mjbo xr xxnb whuieift, zaed jmioidh xv ts xtt elmv fg zok xgwnlpn vbues mp irmc twpb, yebhaoz rdlnrpbj jgg kzmlkyah vo dvta hn jzfxggaxcq.

Can you decode my message without the key? Post me a comment. Enjoy.

Picture Credits:
I'm lying by Taylor Dawn Fortune
Local Praire Dog Gossip by Art G
With grateful thanks to Richard Heathfield.

No comments:

Post a Comment