Department of Mathematics, Statistics
and Computer Science
Wim Ruitenburg's Spring 2006 MATH025.1001
Codes
In class we introduced the idea of a one-letter encoding of 26 letter
text by permuting the characters.
Someone, the owner, `secretly' made an encryption table, and used it to
convert clear text into encrypted text.
The class only sees the encrypted texts
SBALJPBXXTYHBJBDEBQ
DLHKYBPPBYJERBHQEPZ
SEXXUHBLCATVBQWTHWTTV
DLPNBDLPEAQEQWYJ
DEVPBHDQLHBATDEJM
PNBFTCBEQTJZTY
In class we were able to `crack' the code and produce a decryption table.
The encryption table and the decryption table both have 26 lines.
With the decryption table we were able to recover the clear texts.
Next we list some weaknesses we observed in the encryption scheme, and come
up with suggestions to avoid them.
When we send encrypted text of which spies know a lot about the context
of the clear text, then we run the risk of the spy guessing the decryption of
some string of characters.
In general, this is unavoidable.
Avoid sending unnecessary messages using the same encryption.
In our clear texts we frequently encounter strings like THE or ENT or
IN.
In a decryption attempt, a spy may search for patterns in the encrypted text
which correspond with these frequently occurring strings in the clear text.
Avoidance strategies involve replacing each commonly occurring string
by symbols similar to letters, or even by multiple symbols, all designating the
same string.
Then during encryption, the owner sometimes uses one symbol to encode
THE, and the next time uses another symbol to encode THE.
Of course, the encoding and decoding tables will have to be larger.
Nowadays there are automatic ways to compress files.
These compression programs, known by names like COMPRESS or ZIP, reduce the
size of text files significantly by cleverly replacing more frequently
occurring longer strings by shorter strings.
Now the owner should first apply a file compression algorithm similar to
ZIP before applying the encoding table.
So the conversion from clear text to encrypted text takes two consecutive
steps, like putting on your socks, and then putting on your shoes.
A decoder then applies the decoding table, and then the decompression
program like UNZIP, to restore the original file.
So the conversion from encrypted text to clear text also takes two consecutive
steps, but in reverse order like taking off your shoes, and then taking off
your socks.
Despite the significant improvement of combining file compression with
an encoding table, the position of the spy also improves.
The spy can still (try to) crack the code by adding brute computer force to
the repertoire of tools.
So far the owner originally used a one-letter encryption table, which
may look something like
Line nr
Clear
Encrypt
1
A
J
2
B
Z
3
C
B
...
...
...
25
Y
A
26
Z
I
The owner's original decryption table would then look something like
Line nr
Encrypt
Clear
1
A
Y
2
B
C
...
...
...
9
I
Z
10
J
A
...
...
...
26
Z
B
The owner can improve this by using tables that translate pairs of
letters into pairs of letters.
Since there are 26 times 26 equals 676 possible pairs of characters, she should
have encoding and decoding tables that look something like:
Line nr
Clear
Encrypt
1
AA
RN
2
AB
HH
3
AC
YB
...
...
...
25
AY
IC
26
AZ
UE
27
BA
GZ
28
BB
MW
...
...
...
51
BY
DK
52
BZ
IA
53
CA
MT
54
CB
OS
...
...
...
...
...
...
674
ZX
ID
675
ZY
AK
676
ZZ
KG
To save some space on this web page, we don't display the matching 676-line
decryption table.
These tables, combined with file compression, are again a lot better.
There is still the risk that with big computer power, this code will be
cracked by a spy.
So the next natural step for the owner is using an encryption table with
triples of letters.
There are 26 times 26 times 26 equals 17576 triples.
The encryption table will look something like
Line nr
Clear
Encrypt
1
AAA
QMI
2
AAB
OKT
...
...
...
17575
ZZY
WWJ
17576
ZZZ
KQS
and the decryption table will be just as long.
Note that we only display the first two lines, and the last two lines.
This is just to save space.
We write triple dots for the missing 17572 lines.
File compression, combined with these long tables, is again a good improvement.
But is it good enough?
Maybe the owner had better use tables that are 4 characters wide, from
AAAA to ZZZZ.
Unfortunately, we run into the problem that such tables become huge; in the
case of 4 characters, both tables will have 26 times 26 times 26 times 26,
equals 26 to the power 4, equals 456976 lines.
And 4 characters wide may still be insufficient.
The table method becomes more and more unwieldy.
Instead of using encoding and decoding tables from 4 characters to 4
characters, or worse, one uses encoding and decoding algorithms.
What follows is a sketch of how this works.
In this example, we assume that we use huge numbers n, whose factorizations
into primes are not really possible by any general algorithm.
Kate and I are going to communicate using public key cryptography.
Both she and I privately pick two huge prime numbers.
There are lots of prime numbers, and detecting prime numbers is easy.
So both Kate and I easily find our own two private huge primes.
We each keep our pair of prime numbers secret, but we publish the products to
the world.
For illustration purposes, I use small prime numbers below; just pretend that
they are big.
So we have a situation like this:
Everybody knows
Only I know
Only Kate knows
Only the spy knows
161 from me
221 from Kate
209 from the spy
7 and 23
13 and 17
11 and 19
Now there are two machines, called P and
S, which work as follows:
Machine P takes as input a text and a product of two primes,
say a secret text ABC from me, and 221, and as output produces an `encrypted'
string, say XYZ which I make public.
Machine S takes as input a text and two primes, say XYZ and
Kate's (13,23), and as output produces a `decrypted' string, in this case ABC.
In pictures:
ABC in
221 in
P
out XYZ
At this moment, only I know ABC.
Kate can now use her private prime pair:
XYZ in
(13,17) in
S
out ABC
And so Kate also knows ABC.
Whenever the product and the prime factors match, like 221 and (13,17) above,
the machines P and S reverse one another.
The spy may also try to figure out what XYZ stands for, but will fail:
XYZ in
(11,19) in
S
out UVW
where UVW looks like garbage.
If Kate wants to return a confidential reply to me, then she will proceed as follows:
DEF in
161 in
P
out RST
Only I can decypher the message using my private key:
RST in
(7,23) in
S
out DEF
Imagine that I am at headquarters in Milwaukee, and Kate is in the wilds
of Iowa.
Although Kate is the only one who can decrypt my message XYZ, she may not know
for sure that I am the sender.
The following is an alternative way to create an encrypted message, to assure
that she will be the only one who can decrypt the message, as well as know that
I am the only one who could have sent it.
The method uses that both P and S scramble
messages.
ACE in
(7,23) in
S
out ZXV
ZXV in
221 in
P
out YWU
So I used two encoding steps in a row, like putting on socks, and then
putting on shoes.
I only publish YWU to the world.
Now Kate must do two decoding steps in a row, like taking off shoes and then
taking off socks:
YWU in
(13,17) in
S
out ZXV
ZXV in
161 in
P
out ACE
Only Kate can perform the first step, because it requires her private key
(13,17).
And I must be the author of ACE, because I am the only one who could have
created ZXV from ACE using the private key that belongs to 161.
Last updated: February 2008
Comments & suggestions:
wimr@mscs.mu.edu