Shakespeare’s Monkeys

Shakespeare’s Monkeys refers to the monkeys from the Infinite Monkey Theorem which states that an infinite number of monkeys typing on type writers for an infinite amount of time will eventually type a work of Shakespeare. By replacing the monkeys with a random character generator which considers language-specific character sequence statistics by using Markov models, I created an implementation which is not only more practical than placing actual monkeys in front of type writers, but also generates texts which look surprisingly similar to actual German or English texts, albeit without any meaningful structure or content.

For convenience, my implementation comes with character sequence statistics for German and English and can easily be extended by using the provided training script to process a large amount of training text in the desired language. As a more serious side effect of the need for training, the obtained character sequence statistics can be compared and used to determine the language of arbitrary texts with surprisingly high accuracy (given a large amount of training data). Hence, my implementation enables language recognition, including German and English by default.

My implementation consists of multiple Perl scripts with HTML pages for convenience. This way, the language generation, recognition and training can be accessed easily by copying the files onto a Web server. My implementation as well as the elaboration of the corresponding theoretical background (see below) have been a term project for the Pattern Recognition course which I took during the winter term 2006/07.

This project is no longer maintained.