| FSem 132 | Methods and Issues in Cryptology | Fall 2005 |
Log on to a computer. Go to my web page cs.colgate.edu/faculty/nevison and click the link for our seminar. There is a link from there to the lab. In our lab you can use the cosc101 logon.
You will be using a collection of tools to help you with your decryptions. They can be found on the tools link from the syllabus (right-click on the link and choose "open in new window").
Imagine putting all of the letters of some section of text in a bag, and drawing two at random. The index of coincidence tells you the probability that you'll draw two identical letters.
The index of coincidence is useful because the probability of getting two identical letters depends on the distribution of letters in the bag. In general, the probability of two independent events occurring is the product of the probabilities of each event occurring separately. Also, the probability of any one of a collection of mutually exclusive events occurring is the sum of the probabilities of each event occurring separately. Let n be the total number of letters in the bag and let nA be the number of As, nB the number of Bs, etc.; we'll use ni (note the lowercase i) to mean the number of some generic letter i. The probability of drawing some letter i once is ni/n and drawing it twice is [ni/n][(ni-1)/(n-1)] (because once the first letter has been drawn, the number of letters in the bag and the number of i letters have both decreased by 1). The index of coincidence I is the sum of [ni/n][(ni-1)/(n-1)] for all of the letters.
Imagine that the bag contains a very large, completely random collection of letters. The probability of picking any particular letter is 1/26; the probability of picking a second of the same letter is also 1/26 (we assume that there are so many letters in the bag, removing one doesn't significantly change the number of letters remaining or the probability of picking another of the same letter). Thus I = sum of (1/26)2 for all letters = 1/26 = 0.0385. Call this value kr ("r" for "random").
Now imagine that the bag contains a very large collection of English text. Now the probability of picking any particular letter depends on the typical frequency of that letter in English text. Again, we assume that there are so many letters that removing one doesn't significantly change the probability of picking another. Thus I = sum of the squares of the probabilities of picking each letter = (8.2/100)2+(1.5/100)2+(2.8/100)2+(4.3/100)2+... = 0.0667. (This value varies slightly depending on the exact frequencies used for each letter.) Call this value kp = 0.0667 ("p" for plaintext).
The index of coincidence can be used to distinguish between monoalphabetic ciphertexts and polyalphabetic ciphertexts.
The difference in values between kr and kp can be used to distinguish between monoalphabetic and polyalphabetic substitution ciphers. For a monoalphabetic cipher, the letter frequencies are not disguised and so the index of coincidence for the ciphertext is expected to be near kp. On the other hand, a polyalphabetic cipher is designed to scramble the letter frequencies and make them appear more like random text, so the index of coincidence for this ciphertext is expected to be near kr.
The index of coincidence can be used to determine the length of the keyword in a Vigenère cipher.
Arrange the ciphertext in rows under the keyword, so that all of the letters in a given column are encrypted with the same keyword letter. Each column thus represents a monoalphabetic cipher, and the index of coincidence can be computed for the column to verify this property. If the keyword length is incorrect, the columns will contain letters encrypted with different keyword letters and will have an index of coincidence closer to kr than kp.
The index of coincidence can be used to match up polyalphabetic ciphertexts which use the same portion of a long key.
Consider writing two English texts, one above the other. What is the probability that any column contains two identical characters? As it happens, 0.0667. Each column represents one choice of two letters from the bag of English text and so has a probability of 0.0667 of containing identical letters. Counting the number of columns with identical characters and dividing by the number of columns should yield a number close to this value.
Next write two monoalphabetically-encrypted English texts in the same manner, one above the other. The probability that any column contains identical characters is still 0.0667 because the substitution did not change the frequency characteristics of the underlying English text.
Now write two polyalphabetically-encrypted English texts in the same manner. If both texts have the same keyword and they are aligned such that both letters in each column are encrypted with the same keyword letter, the probability of having identical letters in a column is once again 0.0667. This is because the only way to have the same ciphertext letters in a column when the same keyword letter is used is for the underlying plaintext letters to be the same, and the probability of the underlying plaintext letters matching is 0.0667. If, however, the keywords are different or not properly aligned, then the letters in each column are encrypted with different keyword letters and the probability of having identical letters in a column is near random.
As a result, the index of coincidence can also be used to identify and align polyalphabetic ciphertexts produced by the same portion of a long key. For each possible alignment of the two texts, compute the fraction of columns containing identical letters and compare this fraction to kr and kp. If the value is near kp, then the proper alignment has been found. Alternatively, simply count the number of columns containing identical letters and look for an alignment which produces a large number of matching columns (since kp > kr).
| index of coincidence | monoalphabetic or polyalphabetic? | |
|---|---|---|
| cipher #1 | ||
| cipher #2 |
| index of coincidence | monoalphabetic or polyalphabetic? | |
|---|---|---|
| cipher #3 |
(What should the answer be?)
| keyword length |
|---|
| column number | index of coincidence | column number | index of coincidence |
|---|---|---|---|
| appropriate values? also explanation (as needed) | |||
| keyword | |
|---|---|
| plaintext |
In The Pleasures of Counting, Körner describes a rotation cipher called Rk. Let ir be the integer equivalent of the rth letter of the plaintext, where a=0, b=1, c=2, etc. The ciphertext letter is then the letter associated with the integer ir+k(r-1) mod 26.
For example, ROTATION is encoded as RPVDXNUU with k=1, RQXGBSAB with k=2, and RRZJFXGI with k=3. Make sure you understand how this works before continuing on.
| answer |
|---|
| answer |
|---|
| k | 1 | k | 3 | k | 5 |
|---|---|---|---|---|---|
| k' | k' | k' | |||
| method | |||||
| answer |
|---|
Combining several weak ciphers can lead to a stronger cipher.
Cipher #10 ("Passage 1" on page 325 of Körner) has been enciphered using a cipher of the form RkS - first a monoalphabetic substitution cipher (which is easily broken) was applied, and then the rotation cipher Rk (also easily broken) was applied to the result of the substitution to get the final ciphertext.
| answer |
|---|
| index of coincidence | monoalphabetic or polyalphabetic? | |
|---|---|---|
| cipher #10 |
(What would you expect the answer to be?)
| number of rotations | |
|---|---|
| method |
| plaintext |
|---|
| index of coincidence | monoalphabetic or polyalphabetic? | |
|---|---|---|
| cipher #4 |
(What should the answer be?)
| keyword length | |
|---|---|
| explanation |
| same key as #4? | offset (if same key) | |
|---|---|---|
| cipher #5 | ||
| cipher #6 | ||
| cipher #7 | ||
| cipher #8 | ||
| cipher #9 |
| plaintext for #4 |
|---|
This is a tricky problem. If you can't completely solve the cipher, write down as much as you got with an explanation of what you attempted and where you got stuck.
TFCMM TVSBL VLWWE ENEKU QPRXZ GCUKY BADTV SSWPQ CDPQF ZPRZT HPCNM FIMPX PDVKP LTKTB VYJSG UKYOM OGAUF YQBEC IZOGJ NZQJM QFTPQ ZBBZR GDJPA XJOUZ VGVZW CKTBA R#2:
XGSZD CUWTI TCATT CAFLT CAESU KESKG WTWAF TZGCG DVKFD HTAJL WEHWA JAEAJ GCLJK FTSMD WKTSX TLAVU DPTOE DATJX UGEMD CATPA GTCWE STAFD AUGES WTSID CAMDW ECDHV TAGST DPUGE SKSJI DATLG SSTWK GCPTC LTGSJ XUGEM DCATP AGKSG ATLAU GESPJ DSUXS GZAFT KSUJC BTUTW GXUGE SWKGE WTAFT CAFTG VPXDW FJGCT PAUKT GXLJK FTSMD WJPTD VZGCG DVKFD HTAJL WEHWA JAEAJ GCWMD WOEJL RTDWU AGEWT DCPWT LESTD BDJCW AKTGK VTECW LFGGV TPJCL SUKAD CDVUW JW#3:
IVPNO WNHLQ LTJGI WNLAE JYEPJ MPGKW RCNPG LCJTA MTXGH FAEVP VFWBB KIGDR RPCNT IFNBC AGZUH LITAL ILQVB INFMZ GLCPQ ITKRO ECDED BSFXA MCSRT RIZCH LFJYW TIFPO UTGLD JRHZF TFEMI XESEN IGBVW SGGGP CLRFG BXJTF OGSND BRBKX LOPBJ NEZES EGVHU GBDWM YTCNI U#4:
MOIOC KMWSL VVJCV TGQMZ NVVCF RLGSW ITRGK QRGAY UGUKY CVEGH PJWBL MJJHD KZXJB VFMBA XUHFF CIICH ZVVFU WNBLO GXGPY UPWLH PGVAY FPFUV ZUFWG MQZHJ EMNOP PKMGA JDIBJ IRCUI RPJZQ INXVN HWIOJ FHYNV GZIXS FMSJG UMCSL ROTIB UWLSN RMFTE XPAAR ETIFS EWTAQ WPLAG WZHVS GPDNI EJEUK YGWRZ NIXAR RCTPH PBXYG LXNIJ ILXZW FXLLV DMAFB VRJSH BEFLZ LVHDV RGUWW WEWOL W#5:
APWUB NXVGB FVYCV PRZLK NYIAC RMQSF IXRKH PHGAY TMYTY IVHRR LBWDA KWVOA KNSMM DCFFO BXKXS VJISK FVVPW AMMST RSPWG GUMNX CGCYY JOIFE OXWSU ZBSKH MMHUC IIIPD XWEML VYGRJ JPKSZ ICYMT IVAAC UVIBC TZWCI JTWXU RNVVL RHBVX HPANR HMBXG QEIQQ UHPGA ZVEAV IYZKC ASAGS QPNGR EKULE EUDER ELWDA EUPIG ALNJS XMOIL NTPHW GGHWZ NTWLW RRAIE ATBRV EJACD IOCXW KUMLA YEYJO GGRHW VGNGV FYL#6:
BCFOT AVFLD UITRG VQAZX YIUHY UARQN IOIJW CHNYA FPJRQ NXTNI NVCSM ICPKY VVRZG KNNLV QHHEV VKQMJ LEIOU HZABD FWPGW GCZQH LGCGD KVNAC AWSYW SWUZR VYCQB GTECZ GGKNU WMPPX ZSNFM CYMYQ SSUHP MDUJM FHAWB VTMIF DSIXA MWHUA WQIGM YMLYR BVRVC PGHUV VRCOW SZOMK CPIIR XMWGJ BJSCM IFOHU EGTET NMEJW TIOIF WAHZF GTWVM QKAIO THHVP TNVVS LSEXU ZQECM OIOMN LLZGK FDNGU TSCXI MDBYC CBUXH VISGY UGEUG ARNMF XUMFX IGEGB HXQ#7:
IYSBC KXBBT EMTAT VYMUG HRRGC EQAGB TPRHY CRLBH JRTLK TGTUO HCNMJ IVOPR RMPWB NEZOA ABSPS DUIEO VBJBK BAMOO UPRMZ XPVLH PGVAR SUCLW CQTBS QFJBY IDEYH IFNVG IOUBM JUQXP KLCRQ UYVXF YEJIE QCJFQ OAPDX LBRMG VNLSN QDRED ASZWZ CKBQN LHUVR GOSQE KPIJK AAHAY ECUHK VMLCB SEKBR PGPTZ XTGES ZSTIL BGHBD YXSIH TTTAS NAILZ FJWXW FROMF SJSFN INPLD DGEPS JXWMB WFLFT SHIHO LLYVX XCESR GMAWT JEDTV BYGWB ZR#8:
JNOJN KXLEQ ULESZ XCERG ICHME QIYAA XZMJU HTRGI JZYPV NMEZL MRNCX RKCQH FYMKM UHRPR LBGXL ALMUW HQMCF GLJFM MDKPG LPSOL FPNXW GANAZ GASFF RYSYG HTTVB KXLVX OQTAR QMPAT CIELF YGVNV KHRVB PWVJS TRKBP VQBIG AHBLM ZFUYN ZSUEM OCVSL WVRDO ACIHQ LRPGV FKSMK UNXVV YLPGP HGIVJ IKNOG HPCQE EZZNM DBVRV AASCG BLXYL DWTGE XWVVW IOISS TPMKX NAVLN UVBMA LHEXB SZBGK XUSZG BXHYE WBBUL JQEOJ PATFA GVMVS#9:
DRBDK ORNAG JJHWO YTGVM GLWKM VIXKV IVGCA MUIIK RQFSY EILOJ NSLCX XYPKW CWXRR CLJDX LZGAL SAPNI YBUCC NVFDS YBPAW ZQQUC ZFJJD KVDUL RMVXJ ZCHSX XZXQC XILRT SGFMM RRQIA WLRTF NBKHQ VVXYH SHQWY UXWJO CBKBQ AFLWL RHVIA WCNAX MVOUS XUGAC SLLUF IGVIF XULKS CFKXU YKHKI OFSUV LWBHR RDDSR HIFJN LGKEK EYPJL OSWVY WRHQC MXHTP DDXYW QVMZX JQIFW PJGMP WHP#10:
IBUAC XPYRS RDNQE IEQGN ZTMFL FGDUA UNWDP RKYBK PKTVF WVIGF LHXAL ZDBGY MSENA DWIXQ GUACW PYFRH XZAWG KGZVW YRYCP QOIMF IFMOZ JJJVI JDEPW YOCIK AMPCI FSGAL SUQCD TJIVX HPYKM JNOZN WCMVM OKRUY DSGDU ETVCE YQNWR TZIRI YMXLE XPGPU PYAKB BQDAP SLWBF BNOIB UAXTM ZNPUV
| chris@cs.colgate.edu (original by Stina Bridgeman) | last updated: --Friday, Sept 16, 2005-- |