Lectures on Language Technology

Uppsala, January 19, 2018

The computational linguistics group at Uppsala University is delighted to invite you to an afternoon of public lectures on language technology by leading experts in the field. The lectures will take place in Room 22-0031, English Park Campus, Uppsala University, on the 19th of January according to the schedule below. Attendance is free for anyone interested.

13.15-14.00 Graeme Hirst
Department of Computer Science
University of Toronto
Classifying Verbal Autopsy Records by Cause of Death using Neural Networks and Temporal Reasoning
A verbal autopsy is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death was determined by a physician. Current leading automated prediction methods primarily use structured data from verbal autopsies to assign a cause-of-death category. We present a neural-net-based classification method based on textual features to automatically predict cause-of-death categories from free-text verbal autopsy narratives alone. For individual cause-of-death prediction, our best classifier achieves a sensitivity of .770 for adult deaths, as compared to the current best reported sensitivity of .57. When predicting the cause-of-death distribution at the population level, our best classifier achieves .962 CSMF accuracy. Our work in progress adds temporal reasoning to the features that are used. [Joint work with Serena Jeblee, Mireille Gomes, and Prabhat Jha.]
14.00-14.30 Walter Daelemans
CLiPS Research Center
University of Antwerp
Personality Profiling Methodology
Personality profiling from text, e.g. determining whether an author is extraverted or introverted on the basis of what he or she has written, is an intriguing application of author profiling from text. In recent shared tasks, such as the PAN, high accuracies are reported for this task with simple features and methods on specific datasets. However, wildly different features are selected for different datasets, and models don’t work at all on different datasets. In this short talk, I will show that there may be a problem of random correlations and overfitting by comparing between normal cross-validation and embedded cross-validation. I propose “Backward Compatible Evaluation” as a methodology for stylometry in general: models developed on new datasets should after optimization be evaluated on all previously existing relevant data sets.
14.30-15.00 Break
15.00-15.30 Sara Stymne
Department of Linguistics and Philology
Uppsala University
Dependency Parsing with Treebank Embeddings
It is a common case that there are several treebanks available for a given language, which can differ in language variant, domain, genre and annotation style. For data-driven dependency parsers, performance typically improves with more training data. The divergence between different treebanks, however, can make it difficult to take advantage of all available data. The simple approach, to concatenate all available treebanks, often does not work well. In this talk, we will show how we can extend the concept of language embeddings, which have been used to represent different languages in cross-lingual systems, to the mono-lingual case, where we use a similar framework on the treebank level, in the form of treebank embeddings. We extend a neural transition-based dependency parser with treebank embeddings, and show that this framework leads to large improvements for several languages. We also show that we can extend the framework and combine several corpora for one language with corpora for related languages.
15.30-16.15 Marcel Cori
Université Paris Nanterre
What Is (Actually) Natural Language Processing?
We start by giving a first and very simple definition of Natural Language Processing (NLP). We then observe, through a short historical review, that several terms have been competing as names for the discipline. A fundamental question arises: what kind of relationship between NLP and other fields, such as linguistics, computer science, mathematics, artificial intelligence? And a corollary of this question is to know where is located NLP, between science and engineering. In conclusion we focus on NLP contributions to research in linguistics, namely to formal linguistics and to corpus linguistics.