From Traditional Descriptive Grammars to Digital Language Profiles
Traditionally, researchers often study the diversity of world's languages by reading and comparing grammatical descriptions manually. Nowadays, a large amount of linguistic descriptions and books are easily available in digital formats. Reading them all for a wider-level comparison and analysis is way beyond individual people's capabilities. Text technology, i.e. computer-based text management in natural language, is now powerful enough to potentially be used to harvest facts at different levels of detail within a given domain (in this case, information on world languages). In this project we want to utilize a useful collection of 9000 digitized grammatical descriptions covering over a thousand languages in order to significantly expand the ability to make major language comparisons. For this purpose, the project will develop methodologies to enable computers to read grammatical descriptions and automatically extract information ("linguistic facts"). We are to explore and develop a notion of "language profile", which is a structured digital collection and representation of a language encapsulating all available knowledge about a language extracted from various sources.
This project was originally titled From Dust to Dawn: Multilingual Grammar Extraction from Grammars, but we have moved to a more catchy title.