Internet Security Professional Reference:Encryption Overview

Internet Security Professional Reference, Second Edition
(Publisher: Macmillan Computer Publishing)
Author(s): Authors Multiple
ISBN: 156205760x
Publication Date: 07/16/97

Table of Contents

Monoalphabetic Substitutions

Monoalphabetic substitutions, or ciphers, are more difficult to break than their Caesarean counterparts. Here, each character can stand for another, including itself—and there is no reason why one replaces another. Monoalphabetic substitutions are often found in newspaper leisure sections under the name Cryptoquotes, or something similar.

In the following example, the code used is:

A=P

B=R

C=O

D=D

E=U

F=C

G=E

H=L

I=A

J=T

K=M

L=I

M=F

N=Y

O=S

P=V

Q=X

R=J

S=Z

T=B

U=W

V=N

W=Q

X=H

Y=K

Z=G

Thus, the message:

DIAL THE NUMBER OF THE NEW PARTY AND WAIT FOR AN ANSWER

becomes:

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ

This code is much more difficult to break, as each character now has 26 possibilities. A program can be written that will try every possibility for each character and print the results. You can then read all the entries and look for the one that makes sense, or you can apply some rules to the sentence.

One such method is looking for small words and trying substitutions on them. It is safe to assume that one of the three-letter words in the sentence is “THE.” Four possibilities exist, and thus four substitution trials:

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
     THE      E    THE  E     T         T            E

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
       H T    H      H THE        T  E         T  T EH

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
E T      H             H    T    THE  T          TE TE

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
              E HT           E            THE         E

The third scenario is immediately dismissed, as there is no two-letter word TE. The fourth scenario is immediately dismissed as well, for there is no two-letter word HT. That leaves the first and second scenarios as possibilities. Of the two, the first is far and away the most complete and the one upon which a decryption expert would focus.

The next area to focus upon is the two-letter words, now represented as SC and PY. These cannot be BE, TO, or AT, as they do not contain either a “T” or an “E.” That leaves few other possibilities, and through some trial and error, their identity can be ascertained, such that the code now appears as the following:

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
  A  THE N   E  OF THE NE   A T  AN   A T FO  AN AN  E

Assuming that PYD must be AND, and CSJ is FOR:

DAPI BLU YWFRUJ SC BLU YUQ VPJBK PYD QPAB CSJ PY PYZQUJ
D A  THE N   ER OF THE NE   ART  AND  A T FOR AN AN  ER

It is only a matter of time and trial and error until the rest of the puzzle falls into place.

Another Way

In modern English, there is a propensity to use some characters more than others. The “Q,” for example, is rarely used. When it is used, it must be followed by a “U.” According to Cryptography: An Introduction to Computer Security (Seberry and Pieprzyk, Prentice Hall, 1989), the following relative frequency of use can be applied to each letter:

E	12.75
T	9.25
R	8.50
N	7.75
I	7.75
O	7.50
A	7.25
S	6.00
D	4.25
L	3.75
H	3.50
C	3.50
F	3.00
U	3.00
M	2.75
P	2.75
Y	2.25
G	2.00
W	1.50
V	1.50
B	1.35
K	0.50
X	0.50
Q	0.50
J	0.25
Z	0.25

The number of times each character appears in the encrypted message is as follows:

A	2
B	4
C	2
D	2
E	0
F	1
G	0
H	0
I	1
J	4
K	1
L	2
M	0
N	0
O	0
P	6
Q	3
R	1
S	2
T	0
U	5
V	1
W	1
X	0
Y	5
Z	1

Naturally, the larger the piece of encrypted data, the more true it will be to the frequency chart. Nevertheless, applying it to this message, the most frequently used characters in the code are:

B, J, P,U, and Y.

If the frequency theory is correct, these should be replaceable with:

E, T, R, N, and not necessarily in that order.

The most common characters in the actual, unencrypted message are:

A, E, N, R, and T.

Thus, four of the five characters that should be there match up with the characters that are there. This is a respectable beginning. Again the larger the encrypted text, the more likely the frequency distribution is to be accurate.

Table of Contents