“Die Bibliothek der Milliarden Wörter” is a cooperation project between the Leipzig University Library, the Natural Language Processing Group at the Institute of Computer Science at Leipzig University, and the Image and Signal Processing Group at the Institute of Computer Science at Leipzig University. The project is concerned with the technical tasks needed for a digitalisation infrastructure covering processing from scans up to and including the generation of text statistics and visualization. This includes archiving all intermediate prodcuts and completing and correcting meta data. Simple OCR results are transferred into the richer XML-TEI Format and presented in a digital citation infrastructure. Finally information visualization for a large number of texts is developed.
Project duration: May 2013 - December 2014