Wim Ruitenburg's Spring 2006 MATH025.1001

Marquette University

Department of Mathematics, Statistics and Computer Science

Wim Ruitenburg's Spring 2006 MATH025.1001

Codes

In class we introduced the idea of a one-letter encoding of 26 letter text by permuting the characters. Someone, the owner, `secretly' made an encryption table, and used it to convert clear text into encrypted text. The class only sees the encrypted texts
- SBALJPBXXTYHBJBDEBQ
- DLHKYBPPBYJERBHQEPZ
- SEXXUHBLCATVBQWTHWTTV
- DLPNBDLPEAQEQWYJ
- DEVPBHDQLHBATDEJM
- PNBFTCBEQTJZTY
In class we were able to `crack' the code and produce a decryption table. The encryption table and the decryption table both have 26 lines. With the decryption table we were able to recover the clear texts. Next we list some weaknesses we observed in the encryption scheme, and come up with suggestions to avoid them.
When we send encrypted text of which spies know a lot about the context of the clear text, then we run the risk of the spy guessing the decryption of some string of characters. In general, this is unavoidable. Avoid sending unnecessary messages using the same encryption.
In our clear texts we frequently encounter strings like THE or ENT or IN. In a decryption attempt, a spy may search for patterns in the encrypted text which correspond with these frequently occurring strings in the clear text.
- Avoidance strategies involve replacing each commonly occurring string by symbols similar to letters, or even by multiple symbols, all designating the same string. Then during encryption, the owner sometimes uses one symbol to encode THE, and the next time uses another symbol to encode THE. Of course, the encoding and decoding tables will have to be larger.
- Nowadays there are automatic ways to compress files. These compression programs, known by names like COMPRESS or ZIP, reduce the size of text files significantly by cleverly replacing more frequently occurring longer strings by shorter strings. Now the owner should first apply a file compression algorithm similar to ZIP before applying the encoding table. So the conversion from clear text to encrypted text takes two consecutive steps, like putting on your socks, and then putting on your shoes. A decoder then applies the decoding table, and then the decompression program like UNZIP, to restore the original file. So the conversion from encrypted text to clear text also takes two consecutive steps, but in reverse order like taking off your shoes, and then taking off your socks.

Despite the significant improvement of combining file compression with an encoding table, the position of the spy also improves. The spy can still (try to) crack the code by adding brute computer force to the repertoire of tools. So far the owner originally used a one-letter encryption table, which may look something like

Line nr

Clear

Encrypt

1	A	J
2	B	Z
3	C	B
...	...	...
25	Y	A
26	Z	I

The owner's original decryption table would then look something like

Line nr

Encrypt

Clear

1	A	Y
2	B	C
...	...	...
9	I	Z
10	J	A
...	...	...
26	Z	B

The owner can improve this by using tables that translate pairs of letters into pairs of letters. Since there are 26 times 26 equals 676 possible pairs of characters, she should have encoding and decoding tables that look something like:

Line nr

Clear

Encrypt

1	AA	RN
2	AB	HH
3	AC	YB
...	...	...
25	AY	IC
26	AZ	UE
27	BA	GZ
28	BB	MW
...	...	...
51	BY	DK
52	BZ	IA
53	CA	MT
54	CB	OS
...	...	...
...	...	...
674	ZX	ID
675	ZY	AK
676	ZZ	KG

To save some space on this web page, we don't display the matching 676-line decryption table. These tables, combined with file compression, are again a lot better. There is still the risk that with big computer power, this code will be cracked by a spy. So the next natural step for the owner is using an encryption table with triples of letters. There are 26 times 26 times 26 equals 17576 triples. The encryption table will look something like

Line nr

Clear

Encrypt

1	AAA	QMI
2	AAB	OKT
...	...	...
17575	ZZY	WWJ
17576	ZZZ	KQS

and the decryption table will be just as long. Note that we only display the first two lines, and the last two lines. This is just to save space. We write triple dots for the missing 17572 lines. File compression, combined with these long tables, is again a good improvement. But is it good enough? Maybe the owner had better use tables that are 4 characters wide, from AAAA to ZZZZ. Unfortunately, we run into the problem that such tables become huge; in the case of 4 characters, both tables will have 26 times 26 times 26 times 26, equals 26 to the power 4, equals 456976 lines. And 4 characters wide may still be insufficient. The table method becomes more and more unwieldy.

Instead of using encoding and decoding tables from 4 characters to 4 characters, or worse, one uses encoding and decoding algorithms. What follows is a sketch of how this works. In this example, we assume that we use huge numbers n, whose factorizations into primes are not really possible by any general algorithm.

Kate and I are going to communicate using public key cryptography. Both she and I privately pick two huge prime numbers. There are lots of prime numbers, and detecting prime numbers is easy. So both Kate and I easily find our own two private huge primes. We each keep our pair of prime numbers secret, but we publish the products to the world. For illustration purposes, I use small prime numbers below; just pretend that they are big. So we have a situation like this:

Everybody knows

Only I know

Only Kate knows

Only the spy knows

161 from me

221 from Kate

209 from the spy

7 and 23

13 and 17

11 and 19

Now there are two machines, called P and S, which work as follows: Machine P takes as input a text and a product of two primes, say a secret text ABC from me, and 221, and as output produces an `encrypted' string, say XYZ which I make public. Machine S takes as input a text and two primes, say XYZ and Kate's (13,23), and as output produces a `decrypted' string, in this case ABC. In pictures:

ABC in

221 in

P

out XYZ

At this moment, only I know ABC. Kate can now use her private prime pair:

XYZ in

(13,17) in

S

out ABC

And so Kate also knows ABC. Whenever the product and the prime factors match, like 221 and (13,17) above, the machines P and S reverse one another. The spy may also try to figure out what XYZ stands for, but will fail:

XYZ in

(11,19) in

S

out UVW

where UVW looks like garbage. If Kate wants to return a confidential reply to me, then she will proceed as follows:

DEF in

161 in

P

out RST

Only I can decypher the message using my private key:

RST in

(7,23) in

S

out DEF

Imagine that I am at headquarters in Milwaukee, and Kate is in the wilds of Iowa. Although Kate is the only one who can decrypt my message XYZ, she may not know for sure that I am the sender. The following is an alternative way to create an encrypted message, to assure that she will be the only one who can decrypt the message, as well as know that I am the only one who could have sent it. The method uses that both P and S scramble messages.

ACE in

(7,23) in

S

out ZXV

ZXV in

221 in

P

out YWU

So I used two encoding steps in a row, like putting on socks, and then putting on shoes. I only publish YWU to the world. Now Kate must do two decoding steps in a row, like taking off shoes and then taking off socks:

YWU in

(13,17) in

S

out ZXV

ZXV in

161 in

P

out ACE

Only Kate can perform the first step, because it requires her private key (13,17). And I must be the author of ACE, because I am the only one who could have created ZXV from ACE using the private key that belongs to 161.

Last updated: February 2008
Comments & suggestions: wimr@mscs.mu.edu