Department of Mathematics, Statistics
and Computer Science
Wim Ruitenburg's Spring 2004 MATH025.1001
Codes
In class we introduced the idea of a one-letter encoding of 26 letter
text by permuting the characters.
Mary Lou `secretly' made an encryption table, and used it to convert her
clear text into encrypted text.
The class only sees the encrypted texts
SBALJPBXXTYHBJBDEBQ
DLHKYBPPBYJERBHQEPZ
SEXXUHBLCATVBQWTHWTTV
DLPNBDLPEAQEQWYJ
DEVPBHDQLHBATDEJM
PNBFTCBEQTJZTY
In class we were able to `crack' the code and produce a decryption table.
The encryption table and the decryption table both have 26 lines.
With the decryption table we were able to recover the clear texts.
Next we list some weaknesses we observed in the encryption scheme, and come
up with suggestions to avoid them.
When we send encrypted text of which spies know a lot about the context
of the clear text, then we run the risk of the spy guessing the decryption of
some string of characters.
In general, this is unavoidable.
Avoid sending unnecessary messages using the same encryption.
In our clear texts we frequently encounter strings like THE or ENT or
IS.
In a decryption attempt, a spy may search for patterns in the encrypted text
which correspond with these frequently occurring strings in the clear text.
Avoidance strategies involve replacing each commonly occurring string
by symbols similar to letters, or even by multiple symbols, all designating the
same string.
Then during encryption, Mary Lou sometimes uses one symbol to encode THE, and
the next time uses another symbol to encode THE.
Of course, the encoding and decoding tables will have to be larger.
Nowadays there are automatic ways to compress files.
These compression programs, known by names like COMPRESS or ZIP, reduce the
size of text files significantly by cleverly replacing more frequently
occurring longer strings by shorter strings.
Now Mary Lou should first apply a file compression algorithm similar to ZIP
before applying her encoding table.
So the conversion from clear text to encrypted text takes two consecutive
steps, like putting on your socks, and then putting on your shoes.
A decoder then applies the decoding table, and then the decompression
program like UNZIP, to restore the original file.
So the conversion from encrypted text to clear text also takes two consecutive
steps, but in reverse order like taking off your shoes, and then taking off
your socks.
Despite the significant improvement of combining file compression with
an encoding table, the position of the spy also improves.
The spy can still (try to) crack the code by adding brute computer force to
the repertoire of tools.
So far Mary Lou originally used a one-letter encryption table, which may look
something like
Line nr
Clear
Encrypt
1
A
J
2
B
Z
3
C
B
...
...
...
25
Y
A
26
Z
I
Mary Lou's original decryption table would then look something like
Line nr
Encrypt
Clear
1
A
Y
2
B
C
...
...
...
9
I
Z
10
J
A
...
...
...
26
Z
B
Mary Lou can improve this by using tables that translate pairs of letters into
pairs of letters.
Since there are 26 times 26 equals 676 possible pairs of characters, she should
have encoding and decoding tables that look something like:
Line nr
Clear
Encrypt
1
AA
RN
2
AB
HH
3
AC
YB
...
...
...
25
AY
IC
26
AZ
UE
27
BA
GZ
28
BB
MW
...
...
...
51
BY
DK
52
BZ
IA
53
CA
MT
54
CB
OS
...
...
...
...
...
...
674
ZX
ID
675
ZY
AK
676
ZZ
KG
To save some space on this web page, we don't display the matching 676-line
decryption table.
These tables, combined with file compression, are again a lot better.
There is still the risk that with big computer power, this code will be
cracked by a spy.
So the next natural step for Mary Lou is using an encryption table with
triples of letters.
There are 26 times 26 times 26 equals 17576 triples.
The encryption table will look something like
Line nr
Clear
Encrypt
1
AAA
QMI
2
AAB
OKT
...
...
...
17575
ZZY
WWJ
17576
ZZZ
KQS
and the decryption table will be just as long.
Note that we only display the first two lines, and the last two lines.
This is just to save space.
We write triple dots for the missing 17572 lines.
File compression, combined with these long tables, is again a good improvement.
But is it good enough?
Maybe Mary Lou had better use tables that are 4 characters wide, from AAAA
to ZZZZ.
Unfortunately, we run into the problem that such tables become huge; in the
case of 4 characters, both tables will have 26 times 26 times 26 times 26,
equals 26 to the power 4, equals 456976 lines.
And 4 characters wide may still be insufficient.
The table method becomes more and more unwieldy.
Instead of using encoding and decoding tables from 4 characters to 4
characters, or worse, one uses encoding and decoding algorithms.
What follows is a sketch of how this works.
In this example, we assume that we use huge numbers n, whose factorizations
into primes are not really possible.
Mary Lou and I are going to communicate using public key cryptography.
Both she and I privately pick two huge prime numbers.
There are lots of prime numbers, and detecting prime numbers is easy.
So both Mary Lou and I easily find our own two private huge primes.
We each keep our pair of prime numbers secret, but we publish the products to
the world.
For illustration purposes, I use small prime numbers below; just pretend that
they are big.
So we have a situation like this:
Everybody knows
Only I know
Only Mary Lou knows
Only the spy knows
161 from me
221 from Mary Lou
209 from the spy
7 and 23
13 and 17
11 and 19
Now there are two machines, called P and
S, which work as follows:
Machine P takes as input a text and a product of two primes,
say a secret text ABC from me, and 221, and as output produces an `encrypted'
string, say XYZ which I make public.
Machine S takes as input a text and two primes, say XYZ and
Mary Lou's (13,23), and as output produces a `decrypted' string, in this case
ABC.
In pictures:
ABC in
221 in
P
out XYZ
At this moment, only I know ABC.
Mary Lou can now use her private prime pair:
XYZ in
(13,17) in
S
out ABC
And so Mary Lou also knows ABC.
Whenever the product and the prime factors match, like 221 and (13,17) above,
the machines P and S reverse one another.
Someone else may also try to figure out what XYZ stands for, but will fail:
XYZ in
(11,19) in
S
out UVW
where UVW looks like garbage.
If Mary Lou wants to return a confidential reply to me, then she will proceed
as follows:
DEF in
161 in
P
out RST
Only I can decypher the message using my private key:
RST in
(7,23) in
S
out DEF
Imagine that I am at headquarters in Milwaukee, and Mary Lou is in the
wilds of Iowa.
Although Mary Lou is the only one who can decrypt my message XYZ, she may not
know for sure that I am the sender.
The following is an alternative way to create an encrypted message, to assure
that she will be the only one who can decrypt the message, as well as know that
I am the only one who could have sent it.
The method uses that both P and S scramble
messages.
ACE in
(7,23) in
S
out ZXV
ZXV in
221 in
P
out YWU
So I used two encoding steps in a row, like putting on socks, and then
putting on shoes.
I only publish YWU to the world.
Now Mary Lou must do two decoding steps in a row, like taking off shoes and
then taking off socks:
YWU in
(13,17) in
S
out ZXV
ZXV in
161 in
P
out ACE
Only Mary Lou can perform the first step, because it requires her private
key (13,17).
And I must be the author of ACE, because I am the only one who could have
created ZXV.
Last updated: March 2004
Comments & suggestions:
wimr@mscs.mu.edu