FSem 132 Methods and Issues in Cryptology Fall 2005

Lab 1

due Monday, September 26 in class or via email


This assignment will give you some practice with breaking several types of monoalphabetic and polyalphabetic ciphers and will illustrate several applications for the index of coincidence.

Getting Started / Logging In

Log on to a computer. Go to my web page cs.colgate.edu/faculty/nevison and click the link for our seminar. There is a link from there to the lab. In our lab you can use the cosc101 logon.

You will be using a collection of tools to help you with your decryptions. They can be found on the tools link from the syllabus (right-click on the link and choose "open in new window").

The Index of Coincidence

Imagine putting all of the letters of some section of text in a bag, and drawing two at random. The index of coincidence tells you the probability that you'll draw two identical letters.

The index of coincidence is useful because the probability of getting two identical letters depends on the distribution of letters in the bag. In general, the probability of two independent events occurring is the product of the probabilities of each event occurring separately. Also, the probability of any one of a collection of mutually exclusive events occurring is the sum of the probabilities of each event occurring separately. Let n be the total number of letters in the bag and let nA be the number of As, nB the number of Bs, etc.; we'll use ni (note the lowercase i) to mean the number of some generic letter i. The probability of drawing some letter i once is ni/n and drawing it twice is [ni/n][(ni-1)/(n-1)] (because once the first letter has been drawn, the number of letters in the bag and the number of i letters have both decreased by 1). The index of coincidence I is the sum of [ni/n][(ni-1)/(n-1)] for all of the letters.

Imagine that the bag contains a very large, completely random collection of letters. The probability of picking any particular letter is 1/26; the probability of picking a second of the same letter is also 1/26 (we assume that there are so many letters in the bag, removing one doesn't significantly change the number of letters remaining or the probability of picking another of the same letter). Thus I = sum of (1/26)2 for all letters = 1/26 = 0.0385. Call this value kr ("r" for "random").

Now imagine that the bag contains a very large collection of English text. Now the probability of picking any particular letter depends on the typical frequency of that letter in English text. Again, we assume that there are so many letters that removing one doesn't significantly change the probability of picking another. Thus I = sum of the squares of the probabilities of picking each letter = (8.2/100)2+(1.5/100)2+(2.8/100)2+(4.3/100)2+... = 0.0667. (This value varies slightly depending on the exact frequencies used for each letter.) Call this value kp = 0.0667 ("p" for plaintext).

The index of coincidence can be used to distinguish between monoalphabetic ciphertexts and polyalphabetic ciphertexts.

The difference in values between kr and kp can be used to distinguish between monoalphabetic and polyalphabetic substitution ciphers. For a monoalphabetic cipher, the letter frequencies are not disguised and so the index of coincidence for the ciphertext is expected to be near kp. On the other hand, a polyalphabetic cipher is designed to scramble the letter frequencies and make them appear more like random text, so the index of coincidence for this ciphertext is expected to be near kr.

The index of coincidence can be used to determine the length of the keyword in a Vigenère cipher.

Arrange the ciphertext in rows under the keyword, so that all of the letters in a given column are encrypted with the same keyword letter. Each column thus represents a monoalphabetic cipher, and the index of coincidence can be computed for the column to verify this property. If the keyword length is incorrect, the columns will contain letters encrypted with different keyword letters and will have an index of coincidence closer to kr than kp.

The index of coincidence can be used to match up polyalphabetic ciphertexts which use the same portion of a long key.

Consider writing two English texts, one above the other. What is the probability that any column contains two identical characters? As it happens, 0.0667. Each column represents one choice of two letters from the bag of English text and so has a probability of 0.0667 of containing identical letters. Counting the number of columns with identical characters and dividing by the number of columns should yield a number close to this value.

Next write two monoalphabetically-encrypted English texts in the same manner, one above the other. The probability that any column contains identical characters is still 0.0667 because the substitution did not change the frequency characteristics of the underlying English text.

Now write two polyalphabetically-encrypted English texts in the same manner. If both texts have the same keyword and they are aligned such that both letters in each column are encrypted with the same keyword letter, the probability of having identical letters in a column is once again 0.0667. This is because the only way to have the same ciphertext letters in a column when the same keyword letter is used is for the underlying plaintext letters to be the same, and the probability of the underlying plaintext letters matching is 0.0667. If, however, the keywords are different or not properly aligned, then the letters in each column are encrypted with different keyword letters and the probability of having identical letters in a column is near random.

As a result, the index of coincidence can also be used to identify and align polyalphabetic ciphertexts produced by the same portion of a long key. For each possible alignment of the two texts, compute the fraction of columns containing identical letters and compare this fraction to kr and kp. If the value is near kp, then the proper alignment has been found. Alternatively, simply count the number of columns containing identical letters and look for an alignment which produces a large number of matching columns (since kp > kr).


Part I: Monoalphabetic or Polyalphabetic?

In this part, you will use the index of coincidence to determine if ciphertexts are monoalphabetic or polyalphabetic.
  1. Use the Index of Coincidence tool to compute the index of coincidence for ciphers #1 and #2. Record the actual index of coincidence and whether the ciphers are monoalphabetic or polyalphabetic below:

     index of coincidencemonoalphabetic or polyalphabetic?
    cipher #1  
    cipher #2  


Part II: Breaking Vigenère Ciphers

In this part, you will break a Vigenère-encrypted text using the index of coincidence, the Kasiski test, and frequency counts to find the keyword and thus the plaintext.
  1. Use the Index of Coincidence tool to compute the index of coincidence for cipher #3. Record the actual index of coincidence and whether the cipher is monoalphabetic or polyalphabetic below:

     index of coincidencemonoalphabetic or polyalphabetic?
    cipher #3  

    (What should the answer be?)

  2. Knowing the length of the key is very valuable for breaking a Vigenère cipher. Use the Kasiski Analysis tool to apply the Kasiski test to find the keyword length.

    keyword length 

  3. Use the Splitter tool to separate out the columns once you have determined the length of the keyword. This tool will arrange the text in two ways: as a matrix, with the columns aligned vertically, and as a list of columns, with each column's text arranged in a row. The second form is the most convenient for cutting and pasting.

  4. Use the Index of Coincidence tool to find the index of coincidence for each column of ciphertext to verify that the correct length was found for the keyword. (Note: it is easier to paste the entire ciphertext into the "ciphertext" box and use "test period" than it is to compute the index of coincidence for each column separately. Enter your guess for the length of the keyword in the "period" box.) Record the actual value for each column below. (There are more spaces in the table than you'll need.) Are the values appropriate for monoalphabetic ciphers? Explain any anomalies you find. You might also want to try a couple of incorrect periods to see what it looks like when the keyword length is wrong.

    column numberindex of coincidencecolumn numberindex of coincidence
        
        
        
        
        
        
        
        
        
        
    appropriate values?
    also explanation (as needed)
     

  5. Use the Additive Cracker tool or Frequency tool to analyze each column of text to determine the keyword and thus the plaintext. Record both.

    keyword 
    plaintext 


Part III: Rotation Ciphers

In The Pleasures of Counting, Körner describes a rotation cipher called Rk. Let ir be the integer equivalent of the rth letter of the plaintext, where a=0, b=1, c=2, etc. The ciphertext letter is then the letter associated with the integer ir+k(r-1) mod 26.

For example, ROTATION is encoded as RPVDXNUU with k=1, RQXGBSAB with k=2, and RRZJFXGI with k=3. Make sure you understand how this works before continuing on.

  1. k is the key. How many different keys are there which produce useful, non-trivial ciphers? (A trivial cipher is one in which the ciphertext is the same as the plaintext; a non-useful cipher is one in which several different plaintext letters are enciphered to the same ciphertext letter.) Explain.

    answer 

  2. Applying a rotation of k1 followed by a rotation of k2, written k2k1, is equivalent to applying a single rotation of k1+k2. Explain why this is true.

    answer 

  3. For each of the following values of k, state the inverse key k' (where applying Rk' to a ciphertext generated by Rk results in the original plaintext). How did you arrive at your answers?

    k1k3k5
    k' k' k' 
    method 

  4. Is a rotation cipher very secure? Explain. If you say no, include an explanation of how you would break the cipher.

    answer 


Part IV: Körner Passage 1

Combining several weak ciphers can lead to a stronger cipher.

Cipher #10 ("Passage 1" on page 325 of Körner) has been enciphered using a cipher of the form RkS - first a monoalphabetic substitution cipher (which is easily broken) was applied, and then the rotation cipher Rk (also easily broken) was applied to the result of the substitution to get the final ciphertext.

  1. The key for RkS is the cipher alphabet used for the substitution cipher plus the value of k. How many keys are there? (Include trivial keys but do not include keys which cause multiple plaintext letters to be represented by the same ciphertext letter.) Explain.

    answer 

  2. Use the Index of Coincidence tool to compute the index of coincidence for cipher #10. Record the actual index of coincidence and whether the cipher is monoalphabetic or polyalphabetic below:

     index of coincidencemonoalphabetic or polyalphabetic?
    cipher #10  

    (What would you expect the answer to be?)

  3. To break this cipher, note that if the inverse key k' was known, then Rk' could be applied to the ciphertext to undo Rk - and the result would be just the plaintext encrypted by S. So that you don't have to try every possibility, assume that k is either 3, 5, 21, or 23. What is the correct value for k'? How do you know?

    number of rotations 
    method 

  4. Use frequency analysis and the Substitution Cipher tool to crack the unrotated ciphertext. Write down the cipher alphabet you discover. What is the message? (Hint: a particular letter is used instead of a space to indicate a word break.)

    plaintext 

This cipher is not especially strong, though it is better than the individual techniques (especially if you do not know the encryption algorithm); see Körner's "Passage 2" for another combination of rotation and substitution which is stronger than "Passage 1".


Part V: Breaking Vigenère Ciphers, Take Two

In this part, you'll attempt to break a harder instance of a Vigenère-encrypted text.
  1. Use the Index of Coincidence tool to compute the index of coincidence for cipher #4. Record the actual index of coincidence and whether the cipher is monoalphabetic or polyalphabetic below:

     index of coincidencemonoalphabetic or polyalphabetic?
    cipher #4  

    (What should the answer be?)

  2. Knowing the length of the key is very valuable for breaking a Vigenère cipher. Use the Kasiski Analysis tool to apply the Kasiski test to find the keyword length. What do you get? Are any of these values likely to be the correct keyword length? Explain.

    keyword length 
    explanation 

  3. Decryption of messages with a long key is easier if multiple messages can be found which share a key. Use the Polyalphabetic Alignment tool to determine which of ciphers #5-#9 are encrypted with the same key as #4 and what the offset is.

     same key as #4?offset
    (if same key)
    cipher #5  
    cipher #6  
    cipher #7  
    cipher #8  
    cipher #9  

  4. There aren't enough messages to apply frequency analysis to the collection of letters for each keyword letter, so try to decipher cipher #4 by guessing sections of the plaintext. You can use the other ciphers with the same key to help with this process, since the guessed plaintext for all of the messages must result in the same keyword. Use the Vigenère Cracker #2 tool - it may be useful to have several copies of it open simultaneously, one for each of the relevant ciphers.

    plaintext for #4 

    This is a tricky problem. If you can't completely solve the cipher, write down as much as you got with an explanation of what you attempted and where you got stuck.


Ciphers

#1:
TFCMM TVSBL VLWWE ENEKU QPRXZ GCUKY BADTV SSWPQ CDPQF ZPRZT HPCNM
FIMPX PDVKP LTKTB VYJSG UKYOM OGAUF YQBEC IZOGJ NZQJM QFTPQ ZBBZR
GDJPA XJOUZ VGVZW CKTBA R 
#2:
XGSZD CUWTI TCATT CAFLT CAESU KESKG WTWAF TZGCG DVKFD HTAJL WEHWA
JAEAJ GCLJK FTSMD WKTSX TLAVU DPTOE DATJX UGEMD CATPA GTCWE STAFD
AUGES WTSID CAMDW ECDHV TAGST DPUGE SKSJI DATLG SSTWK GCPTC LTGSJ
XUGEM DCATP AGKSG ATLAU GESPJ DSUXS GZAFT KSUJC BTUTW GXUGE SWKGE
WTAFT CAFTG VPXDW FJGCT PAUKT GXLJK FTSMD WJPTD VZGCG DVKFD HTAJL
WEHWA JAEAJ GCWMD WOEJL RTDWU AGEWT DCPWT LESTD BDJCW AKTGK VTECW
LFGGV TPJCL SUKAD CDVUW JW 
#3:
IVPNO WNHLQ LTJGI WNLAE JYEPJ MPGKW RCNPG LCJTA MTXGH FAEVP VFWBB
KIGDR RPCNT IFNBC AGZUH LITAL ILQVB INFMZ GLCPQ ITKRO ECDED BSFXA
MCSRT RIZCH LFJYW TIFPO UTGLD JRHZF TFEMI XESEN IGBVW SGGGP CLRFG
BXJTF OGSND BRBKX LOPBJ NEZES EGVHU GBDWM YTCNI U 
#4:
MOIOC KMWSL VVJCV TGQMZ NVVCF RLGSW ITRGK QRGAY UGUKY CVEGH PJWBL
MJJHD KZXJB VFMBA XUHFF CIICH ZVVFU WNBLO GXGPY UPWLH PGVAY FPFUV
ZUFWG MQZHJ EMNOP PKMGA JDIBJ IRCUI RPJZQ INXVN HWIOJ FHYNV GZIXS
FMSJG UMCSL ROTIB UWLSN RMFTE XPAAR ETIFS EWTAQ WPLAG WZHVS GPDNI
EJEUK YGWRZ NIXAR RCTPH PBXYG LXNIJ ILXZW FXLLV DMAFB VRJSH BEFLZ
LVHDV RGUWW WEWOL W 
#5:
APWUB NXVGB FVYCV PRZLK NYIAC RMQSF IXRKH PHGAY TMYTY IVHRR LBWDA
KWVOA KNSMM DCFFO BXKXS VJISK FVVPW AMMST RSPWG GUMNX CGCYY JOIFE
OXWSU ZBSKH MMHUC IIIPD XWEML VYGRJ JPKSZ ICYMT IVAAC UVIBC TZWCI
JTWXU RNVVL RHBVX HPANR HMBXG QEIQQ UHPGA ZVEAV IYZKC ASAGS QPNGR
EKULE EUDER ELWDA EUPIG ALNJS XMOIL NTPHW GGHWZ NTWLW RRAIE ATBRV
EJACD IOCXW KUMLA YEYJO GGRHW VGNGV FYL
#6:
BCFOT AVFLD UITRG VQAZX YIUHY UARQN IOIJW CHNYA FPJRQ NXTNI NVCSM
ICPKY VVRZG KNNLV QHHEV VKQMJ LEIOU HZABD FWPGW GCZQH LGCGD KVNAC
AWSYW SWUZR VYCQB GTECZ GGKNU WMPPX ZSNFM CYMYQ SSUHP MDUJM FHAWB
VTMIF DSIXA MWHUA WQIGM YMLYR BVRVC PGHUV VRCOW SZOMK CPIIR XMWGJ
BJSCM IFOHU EGTET NMEJW TIOIF WAHZF GTWVM QKAIO THHVP TNVVS LSEXU
ZQECM OIOMN LLZGK FDNGU TSCXI MDBYC CBUXH VISGY UGEUG ARNMF XUMFX
IGEGB HXQ 
#7:
IYSBC KXBBT EMTAT VYMUG HRRGC EQAGB TPRHY CRLBH JRTLK TGTUO HCNMJ
IVOPR RMPWB NEZOA ABSPS DUIEO VBJBK BAMOO UPRMZ XPVLH PGVAR SUCLW
CQTBS QFJBY IDEYH IFNVG IOUBM JUQXP KLCRQ UYVXF YEJIE QCJFQ OAPDX
LBRMG VNLSN QDRED ASZWZ CKBQN LHUVR GOSQE KPIJK AAHAY ECUHK VMLCB
SEKBR PGPTZ XTGES ZSTIL BGHBD YXSIH TTTAS NAILZ FJWXW FROMF SJSFN
INPLD DGEPS JXWMB WFLFT SHIHO LLYVX XCESR GMAWT JEDTV BYGWB ZR 
#8:
JNOJN KXLEQ ULESZ XCERG ICHME QIYAA XZMJU HTRGI JZYPV NMEZL MRNCX
RKCQH FYMKM UHRPR LBGXL ALMUW HQMCF GLJFM MDKPG LPSOL FPNXW GANAZ
GASFF RYSYG HTTVB KXLVX OQTAR QMPAT CIELF YGVNV KHRVB PWVJS TRKBP
VQBIG AHBLM ZFUYN ZSUEM OCVSL WVRDO ACIHQ LRPGV FKSMK UNXVV YLPGP
HGIVJ IKNOG HPCQE EZZNM DBVRV AASCG BLXYL DWTGE XWVVW IOISS TPMKX
NAVLN UVBMA LHEXB SZBGK XUSZG BXHYE WBBUL JQEOJ PATFA GVMVS 
#9:
DRBDK ORNAG JJHWO YTGVM GLWKM VIXKV IVGCA MUIIK RQFSY EILOJ NSLCX
XYPKW CWXRR CLJDX LZGAL SAPNI YBUCC NVFDS YBPAW ZQQUC ZFJJD KVDUL
RMVXJ ZCHSX XZXQC XILRT SGFMM RRQIA WLRTF NBKHQ VVXYH SHQWY UXWJO
CBKBQ AFLWL RHVIA WCNAX MVOUS XUGAC SLLUF IGVIF XULKS CFKXU YKHKI
OFSUV LWBHR RDDSR HIFJN LGKEK EYPJL OSWVY WRHQC MXHTP DDXYW QVMZX
JQIFW PJGMP WHP
#10:
IBUAC XPYRS RDNQE IEQGN ZTMFL FGDUA UNWDP RKYBK PKTVF WVIGF LHXAL
ZDBGY MSENA DWIXQ GUACW PYFRH XZAWG KGZVW YRYCP QOIMF IFMOZ JJJVI
JDEPW YOCIK AMPCI FSGAL SUQCD TJIVX HPYKM JNOZN WCMVM OKRUY DSGDU
ETVCE YQNWR TZIRI YMXLE XPGPU PYAKB BQDAP SLWBF BNOIB UAXTM ZNPUV


chris@cs.colgate.edu (original by Stina Bridgeman) last updated: --Friday, Sept 16, 2005--