This page is the home page for the course on advanced information retrieval. The course covers topics on conventional information retrieval, information extraction, and neural approaches to information retrieval. The content is designed for students with a background in computer science and computational linguistics. By the end of the course, you should be able to
- explain in detail the most common techniques of text indexing, text classification, and information extraction
- explain various types of information retrieval models
- explain various types of information extraction models
- explain the common techniques of document representation and document classification
- evaluate an information retrieval and an information extraction system
- analyze and critically review scientific publications in the field of information retrieval/extraction
- apply basic tools for indexing and information retrieval
- implement some of the basic tools of information retrieval
- formulate and critically discuss the methodological assumptions made by the approaches mentioned in the course
- IntIR: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.
- SpLan: Dan Jurafsky and James H. Martin, Speech and Language Processing (3rd ed. draft) - Ch. 18, 25
- NeuIR: Bhaskar Mitra and Nick Craswell (2018), An Introduction to Neural Information Retrieval
- Individual assignment reports
- Lab reports
- Seminar presentation
- Literature review
- Term Project - project proposal and report
Meeting ID: 63573171418 (password required)
Introduction and course outline
An introduction to the course in general including the content, teaching and examination.
IntIR Ch 1, 2.3: Boolean retrieval, inverted index, processing boolean queries.
Scoring, Term Weighting & the Vector Space Model
IntIR Ch 6.1-3
IntIR Ch 8 and Common Evaluation Measures
Relevance Feedback and Query Expansion
IntIR Ch 9 and Relevance feedback in IR
Probabilistic Information Retrieval
IntIR Ch 11, and Robertson and Zaragoza (2009)
Language Models for Information Retrieval
IntIR Ch 12.1-3
Text Classification and Naïve Bayes
IntIR Ch 13.1-4
Vector Space Classification
IntIR Ch 14
Neural Networks for Information Retrieval
NeuIR, (extra reading LTR4IR and NN4IR)
SpLan Ch 17
The initial submission deadline for all assignments, lab reports, literature reviews, and project reports are mentioned either in this page or in the corresponding instruction. The secondary submission deadline for all the mentioned items is on the 14th of August.
Individual assignment! Prepare a report for each series of exercises here and upload your report to the student portal by the 21st of May.
Seminars are extensions of some sections of the books. They are group activities and should not take more 15 minutes. If more time is needed for some cases, it can be discussed. The group members can decide about their way of presentations. The topics listed below are what the former students presented.
- Boolean and Ranked Retrieval - Submission deadline: 2021-04-23
- Test Collections - Submission deadline: 2021-05-07
- Evaluation - Submission deadline: 2021-05-21
This is a group activity. Each group consisting of two students should prepare a short summary (max 3 pages) for three papers out of those listed below. The summaries should be uploaded into the student portal before 2021-05-21 00:00 and address the following items:
- the research questions of the paper
- the contributions of the paper
- a brief summary of the method if it introduces a new method
- the interesting points of the paper
- the unclear parts of the paper
- VG score: 1) the summary should be comprehensive and well written, 2) it should introduces novel ideas on how to improve at least one of the papers
Here are the papers to review.
- Ellen M. Voorhees, Natural Language Processing for Information Retrieval, Information Extraction: Towards Scalable, Adaptable SystemsJanuary 1999 Pages 32–48
- Adam Berger and John Lafferty, Information retrieval as statistical translation, SIGIR 99
- Lee, Changki, and Gary Geunbae Lee. Probabilistic information retrieval model for a dependency structured indexing system Information processing & management 41.2 (2005): 161-175.
- Stefan Riezler, Alexander Vasserman, Ioannis Tsochantaridis, Vibhu Mittal, Yi Liu (2007), Statistical Machine Translation for Query Expansion in Answer Retrieval, ACL 2007
- Mintz, Mike, et al. (2009) Distant supervision for relation extraction without labeled data. ACL 2009.
- Delphine Bernhard (2010), Query Expansion based on Pseudo Relevance Feedback from Definition Clusters, Coling 2010
- Zhang, Y., Zhong, V., Chen, D., Angeli, G., and Manning, C. D. (2017). Position-aware attention and supervised data improve slot filling., EMNLP 2017
- Ji, Guoliang, et al. (2017), Distant supervision for relation extraction with sentence-level attention and entity descriptions. AAAI 2017.
- Bhaskar Mitra, Fernando Diaz, and Nick Craswell. (2017), Learning to Match using Local and Distributed Representations of Text for Web Search. WWW '17.
- Guozheng Rao, Weihang Huang, Zhiyong Feng, Qiong Cong (2018) LSTM with sentence representations for document-level sentiment classification, Neurocomputing
- George Brokos, Polyvios Liosis, Ryan McDonald, Dimitris Pappas, and Ion Androutsopoulos (2018), AUEB at BioASQ 6: Document and Snippet Retrieval, BioASQ 2018
- MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli (2019), CEDR: Contextualized Embeddings for Document Ranking, SIGIR'19
- Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, and Jungang Xu (2018) NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval EMNLP 2018
- Tapas Nayak, Hwee Tou Ng (2020) Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction, AAAI
- Jibril Frej, Didier Schwab, Jean-Pierre Chevallet, (2020) WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset LREC-2020
- Lanza et. al., (2020), Towards Automatic Thesaurus Construction and Enrichment, LREC 2020
- Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, Jamie Callan (2021), Complementing Lexical Retrieval with Semantic Residual Embedding, ECIR 2021
- Zhepei Wei, Yantao Jia, Yuan Tian, Mohammad Javad Hosseini, Mark Steedman, Yi Chang (2021), Joint Extraction of Entities and Relations with a Hierarchical Multi-task Tagging Model, ECIR 2021
- Anu Shrestha, Francesca Spezzano (2021), Textual Characteristics of News Title and Body to Detect Fake News: A Reproducibility Study, ECIR 2021
- Alberto Purpura, Karolina Buchner, Gianmaria Silvello, Gian Antonio Susto (2021), Neural Feature Selection for Learning to Rank
- Anastasia Taranova, Martin Braschle (2021), Textual Complexity as an Indicator of Document Relevance ECIR 2021
This is a group activity. You can work on your own topic or choose a topic from our IR project list of topics. You are strongly encouraged to develop your own ideas. The project topics have to be confirmed by the course instructor. You need to write a project proposal of at most two pages and submit to the student portal by the 8th of May. For the project report, you can choose between the two possible project deadlines, the 1st of June, or the 14the of August. By one of these deadlines, you need to upload your project reports (max three pages) to the student portal. The second deadline is mainly intended for you who have failed by the first deadline and need to resubmit your report. However, you can also use the second deadline if you have not submitted the report earlier. The next possibility for resubmission in case of failure is the next time the course is given.