Lectures on Language Technology

Uppsala, November 26, 2019

The computational linguistics group at Uppsala University is delighted to invite you to these public lectures on language technology by leading experts in the field. The lectures will take place in Room 16-0043, English Park Campus, Uppsala University, on the 26th of November according to the schedule below. Attendance is free for anyone interested.

10.15-10.45 Emily M. Bender
Department of Linguistics
University of Washington
Language Models Do Not Encode Meaning
In this talk, presenting on joint work with Alexander Koller, I will look at current state-of-the-art approaches to ostensibly meaning-sensitive tasks and argue that much of the recent progress, though undoubtedly useful in several practical tasks, does not represent actual progress towards natural language understanding. More succinctly: a system trained on form alone cannot in principle learn meaning. If the field is to move towards true natural language understanding, we need to (i) make sure that the systems are given information about meaning as well as form in the training data; (ii) join bottom-up and top-down approaches to computational semantics; and (iii) pay attention to the linguistic structures involved in mediating form and meaning.
10.45-11.15 Richard Johansson
Department of Computer Science and Engineering
University of Gothenburg
Discovering Semantic Shifts Using Diachronic Word Sense Embeddings
Word embedding methods provide a simple and practical representation of lexical semantics that can be learned in an unsupervised fashion from unannotated corpora. The fact that these methods discover meaning representations automatically makes them attractive as a research tool in investigations where our goal is to discover semantic shifts in diachronic corpora. However, most previous applications of word embeddings in investigations of semantic change have been limited because the common types of word embeddings do not distinguish between different senses of words, which makes the results harder to interpret. In this talk, we describe a number of extensions of diachronic word embedding models that allow them to express sense distinctions, and describe some preliminary experiments where we apply these models in diachronic English corpora.
11.15-11.30 Break
11.30-12.00 Paola Merlo
Department of Linguistics
University of Geneva
Word Embeddings and Linguistically-Informed Notions of Similarity
In the computational study of intelligent behaviour, the domain of language is distinguished by the complexity of the representations and vast amounts of quantitative text-driven data. Recent neural-network-based and distributed semantics techniques have brought these two aspects of language research to bear on systems of considerable practical success and impressive performance. In the spirit of better understanding the properties of such constructed distributed spaces, I will investigate the notion of similarity presenting two recent case-studies.

On the one hand, we study whether the notion of similarity in the intervention theory of locality is related to current notions of similarity in word embedding spaces. We present results that show that word embeddings and the similarity spaces they define do not correlate with experimental results on intervention similarity in long-distance dependencies. These results show that the linguistic encoding in distributed representations does not appear to match theoretical, syntactic definitions of similarity.

On the other hand, research on the bilingual lexicon has uncovered fascinating interactions between the lexicons of the native language and of the second language in bilingual speakers. In particular, it has been found that the lexicon of the underlying native language affects the organisation of the second language and its similarity structure. We compare cross-lingual word embeddings to the shared-translation effect and the cross-lingual coactivation effects of false and true friends (cognates) found in humans. We find that the similarity structure of the cross-lingual word embeddings space yields the same effects as the human bilingual lexicon.

The preliminary conclusion can be, then, that the notion of similarity corresponds to the notion of semantic neighbourhood that comes from psycholinguistic work on the lexicon, but not to the notion of feature-based similarity defined in syntax and sentence processing.

12.00-12.30 Daniel Zeman
Institute of Formal and Applied Linguistics
Charles University in Prague
Towards Deep Universal Dependencies
Many linguistic theories and annotation frameworks contain a deep-syntactic and/or semantic layer. While many of these frameworks have been applied to more than one language, none of them is anywhere near the number of languages that are covered in Universal Dependencies (UD). I will present a prototype of Deep Universal Dependencies, a two-speed concept where minimal deep annotation can be derived automatically from surface UD trees, while richer annotation can be added for datasets where appropriate resources are available. The talk is based on joint work with Kira Droganova.