Automatic Decryption of Historical Manuscripts

Thousands of enciphered historical manuscripts are buried in libraries and archives. Examples of such material are diplomatic correspondence and intelligence reports, private letters and diaries as well as manuscripts related to secret societies, or other (religious) groups in the margins of society. The bulk of these historical manuscripts will remain undeciphered unless we can automate – in part or in full – the processes involved in decoding them. Our aim is to develop resources and computer-aided tools for automatic and semi-automatic decoding of historical source material by cross-disciplinary research involving computational linguistics, computer science, history, linguistics and philology.

Within the DECODE project, we release resources and tools with open access to facilitate research in historical cryptology in general, allowing collection, analysis and decryption of historical ciphertexts.

The DECODE database contains a collection of digitized images of ciphertexts and encryption keys along with metadata information about their provenance, location, transcription, and possible cryptanalysis or commentary. The database enables search and all records in the database are open to the public. Due to license restrictions, some images of records are private and cannot be visualised or downloaded but information about their whereabouts are documented so those interested can order images directly from the archive/library. Users with an account to the database can also upload and download ciphertexts and keys with metadata information and related documents. We also provide freely available transcriptions of many ciphers which can be used for cryptanalysis of each cipher, or training automatic cryptology tools for cipher detection and decryption.

HistCorp is a collection of historical corpora and other useful resources and tools for researchers working with historical text. Currently, you may download historical corpora for fourteen different European languages: Czech, Dutch, English, French, German, Greek, Hungarian, Icelandic, Italian, Latin, Portuguese, Slovene, Spanish, and Swedish. We also provide language models derived from these historical sources which may be downloaded from the Language Models section. You may also create your own language models, by uploading historical sources of your choice. Furthermore, we provide tools for the automatic processing of historical texts. So far, a tool for spelling normalisation, where the historical spelling is automatically transformed to a modern spelling, is provided. You may enter a text or upload a file to have it normalised, or you may download the necessary tools to do the normalisation locally on your own computer.

Shortly, we will also release tools to facilitate transcription, cryptanalysis and further decryption of ciphertexts, or map keys and ciphertexts of your choice and get a decrypted version of the ciphertext. The transcription tool can be used to transcribe/transliterate the image in an interactive fashion by using image processing with clustering. The transcribed version of the cipher can then be statistically analysed in various ways using standard methods for the analysis of historical ciphertexts, such as n-gram frequencies, n-gram distances, index of coincidence, entropy measures, and pattern dictionaries. The statistical analysis can then be used in the interactive decryption tool where the user can map symbols in the ciphertext and the alphabet in the plaintext language (that is the underlying, original language) and get a decrypted file.

Lastly, we provide a link to the CrypTool 2 (CT2) platform, developed by prof. Bernhard Esslinger and his team, which is an open-source, e-learning platform for cryptography and cryptanalysis, offering a visual programming GUI to experiment with cryptographic procedures. CrypTool 2 provides a variety of cryptanalytical tools to analyze or break classical (as well as modern) ciphers and can be downloaded for Windows.

The DECODE project team cooperates with the CrypTool team, and the DECODE database is now available within CrypTool 2 as well. In addition, we also submit ciphers from the DECODE database to MysteryTwister C3 (MTC3), which is an international Crypto Cipher Contest offering a broad variety of challenges, a moderated forum and an ongoing hall-of-fame of the best cipher crackers.

We would be happy to hear about your comments and suggestions for improvements.

The DECODE team
Beáta Megyesi, PI