homeworks Homework 2 NB: Homeworks are for submission in singles. Please, share ideas, attack methods,
but not the results.

Hints for HW2

0. On pen&paper solution:
Again there is no penalty for computer solutions, however all of the homework is solvable
by hand: Grilles up to 12x12 are a classical hand solution exercise, even without hints.
Vigenere and ADFGX are also within reach (you can also use the IC programs below).

1. Note that 4.A can be solved by trying all the keys, such solutions will not count.

2. You are welcome to share scripts/programs that calculate IC (index of coincidence) of a
string and MC (mutual index of coincidence), and plaintext recognition, etc. I must note
that a perl script (by Jason) and a C program (by Baruch) for HW1 were very popular among
solvers.

3. Here is a C program that calculates frequencies and the IC (note that it counts all the
frequencies, so delete spaces if they are in your file, here is an example of the test input). Credit
goes to george of the ACA site.

4. Here is a set of programs that perform plaintext recognition (I don't know how good
they are, but you can use the ideas):

It may be interesting to compare a trigram statistics vs. the conditional bigrams (letter
transition probabilities P(AB | given A*)).

(from this point down, it is new,   03.04.2000)
5. By request of some of you [advanced hill climbing]:
 Hill Climbing Using Genetic Algorithms, by A.J.Bagnall, et.al.
"A Fast Method of the Cryptanalysis of Substitution Ciphers", by Thomas Jakobsen
Experienced hill climbers may also need this or this.

6. The code I used for Beaufort encryption/decryption (1st puzzle):
/* One student was misslead by the Beaufor table at the ACA site.*/
/* Beaufort encryption below is exactly the same as was shown at */
/* the lecture. Just cipher = Key[count]-plain;                  */

....
  unsigned char plain, cipher;
  int count = 0;
  int five = 0;

  while(!feof(file_handle)){
    plain = fgetc(file_handle);
    if((plain >= 'A') && (plain <='Z')) {
      plain =plain - 'A';
      cipher = Key[count]-plain; if (cipher < 'A') cipher +=26;
      count = (count+1)% period; printf("%c", cipher); five = (five+1)%5;
      if(five == 0) printf(" ");
    }
    else if ((plain >= 'a') && (plain <='z')) {
      plain= plain -'a';
      cipher = Key[count]-plain; if (cipher < 'A') cipher +=26;
      count = (count+1)% period; printf("%c", cipher); five = (five+1)%5;
      if(five == 0) printf(" ");
    }
  }
....

7. Here are several programs donated to you by Baruch:
"Here are 2 programs for grills and one for Vigenere:
In these two change a const to match grills of different sizes;

One is the one that puts the grill on the cyphertext to get the
plaintext (has been improved a little).
Another one is a program that gets a grill with a few holes and erases the
letters that these holes cover (so if you're sure you have some holes right,
you can run it to take the letters you can't use anymore from the
cyphertext).

The last one is a program for finding repeated strings of given length
in the ciphertext - good for figuring out the length of the key for
Vigenere cipher (or its Beaufort variant)."

Period of the Vigenere
Grill
Grill with removed letters

8. Variance of the IC statistics (follow up to a question of Leonid):
In general IC of more than 30 letters helps to distinguish between mono-alphabetic and poly-alphabetic
cases:

Expected number of coincidences in a  30-letter sample:

    Random :    30*29*IC(Random ) = 30*29*0,0385  ~ 34
    English:    30*29*IC(English) = 30*29*0,0667  ~ 58

The sigma (square root of the variance) for this case is about 15. Note that distance of 4*sigma  between
the modes is preferable for high reliability.

General formula for variance of the IC statistics for the English text of length N letters:

sigma^2  =  (0,004344*N^3  + 0,110448*N^2  - 0,114794*N) /(N^2 * (N-1)^2)
 

9. Most of the 12x12 grilles contain the word "elephants" (9 letters)
(Correction 09.04.2000)
For Grille #6  please use crib: "cautious drive of"
Grille #7 contains "elephant is" (not "elephants")
 

10. Some answers to questions of Baruch Oxman which may be very usefull to all
(problems 3 and 4.a).

11. The homework problems in the order of increasing complexity (as I see it):
1, 4.a, 4.b, 2.b, 2.a, 3

12. Some crypto student probably boroughed the Enigma machine for doing the homework.
Note that HW2 can be done without it. If you are responsible for it, please return the machine
to the BP.

(from this point down, it is new,   04.04.2000)
13. In 4.b you may assume that the Reflector is known. Your aim in 4.b is to minimize the number
of queries to the machine (A query is a result of pressing the key on the keyboard. You are also alowed
to change the indicator setting, at each query if you need. However count the number of time  you do it.)

(from this point down, it is new,   05.04.2000)
14. Here are: trigram counts, conditional probabilities of trigrams,  and log -conditional probabilities
of trigrams presented to you by Eran Ofek.

(from this point down, it is new,   06.04.2000)
15. Problem 3: The ADFGX checkerboard in this problem was generated using a pronouncable keyword (will not necessarily appear
in your dictionary). Like this:

   A D F G   X

A  K E Y W   O
D  R D A B   C
F  F G H I/J L
G  M N P Q   S
X  T U V X   Z

It is obvious that the main problem is the unknown transposition. So:

1. guess the period

2. write out the rectangles of the two cryptograms and ... look at them
your aim is to recover the broken pair columns, which if done correctly
will show mono-alphabetic substitution.

3. if you still do not have any ideas, re-read the problem statement carefully
[I know of at least 2 ways of reducing the  number of possible transpositions
drastically, one is just by staring at the messages]

4.Figuring the rest of the transposition is a pen & paper  excercise for about 1 hour,
especially if you use  the IC calculating program given above.

5.If you are stuck at the point of solving the mono-alphabetic substitution, try
the vowel-consonants separation method. Or, anagramming to check for frequent
digrams, or any other method.

(new 11.04.2000)
The period of the transposition is more than 4 and less than 22. The transposition
key is not a word from a dictionary. Remember that if messages are shorter than
the period the padding symbol is used.

Please mail me the answer of problem 3 as soon as you get it.