Ali Basirat



Information Retrieval (2021)

This page is the home page for the course on advanced information retrieval. The course covers topics on conventional information retrieval, information extraction, and neural approaches to information retrieval. The content is designed for students with a background in computer science and computational linguistics. By the end of the course, you should be able to

Main References

Examination

The Zoom link to the online sessions

Meeting ID: 63573171418 (password required)

Topics and Slides
Deadlines

The initial submission deadline for all assignments, lab reports, literature reviews, and project reports are mentioned either in this page or in the corresponding instruction. The secondary submission deadline for all the mentioned items is on the 14th of August.

Home works, Assignments, and Exercises

Individual assignment! Prepare a report for each series of exercises here and upload your report to the student portal by the 21st of May.

Seminars

Seminars are extensions of some sections of the books. They are group activities and should not take more 15 minutes. If more time is needed for some cases, it can be discussed. The group members can decide about their way of presentations. The topics listed below are what the former students presented.

Topic Source Group - slide Date
Variant Tf-idf Functions IntIR Section 6.4 IR Group Division 1 15/4
PPMI Weighting SpLan 6.6-6.7, Church and Hanks 89 IR Group Division 2 15/4
Results Snippets IntIR Section 8.7, Tombros and Sanderson (98), Turpin et. al., 2007 IR Group Division 4 20/4
Statistical Machine Translation for Query Expansion Reizler et. al., ACL-2007 IR Group Division 3 22/4
Okapi BM25 IntIR Section 11.4.3, Jones, K. S. (2004) IIR Group Division 5 29/4
Two Studies on Probabilsitic Information Retrieval Lillis et. al. (2010), and Zhao et. al., (2011) IR Group Division 6 4/5
Query by document retrieval approach Weng et. al., (2010), Query by document via a decomposition-based two-level retrieval approach IR Group Division 7 6/5
Rhetorical relations for information retrieval. Lioma et. al. (2012), Rhetorical relations for information retrieval IR Group Division 8 12/5
A novel TF-IDF weighting scheme Jiaul H. Paik (2013), A novel TF-IDF weighting scheme for effective ranking IR Group Division 9 12/5
Information Retrieval Models Based on Word Embeddings Vulić, I. and Moens M., (2015), Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings. IR Group Division 10 12/5
Learning to Match for WEB Search Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed .... IR Group Division 12 12/5
Dual Embeddings for Document Ranking Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana, (2016), A Dual Embedding Space Model for Document Ranking IR Group Division 13 12/5
Word Embeddings for Query Expansion Fernando Diaz, Bhaskar Mitra, Nick Craswell, (2016), Query Expansion with Locally-Trained Word Embeddings IR Group Division 15 12/5

Labs
Group assignment!
  1. Boolean and Ranked Retrieval - Submission deadline: 2021-04-23
  2. Test Collections - Submission deadline: 2021-05-07
  3. Evaluation - Submission deadline: 2021-05-21
Literature review

This is a group activity. Each group consisting of two students should prepare a short summary (max 3 pages) for three papers out of those listed below. The summaries should be uploaded into the student portal before 2021-05-21 00:00 and address the following items:

  1. the research questions of the paper
  2. the contributions of the paper
  3. a brief summary of the method if it introduces a new method
  4. the interesting points of the paper
  5. the unclear parts of the paper
  6. VG score: 1) the summary should be comprehensive and well written, 2) it should introduces novel ideas on how to improve at least one of the papers

Here are the papers to review.

  1. Ellen M. Voorhees, Natural Language Processing for Information Retrieval, Information Extraction: Towards Scalable, Adaptable SystemsJanuary 1999 Pages 32–48
  2. Adam Berger and John Lafferty, Information retrieval as statistical translation, SIGIR 99
  3. Lee, Changki, and Gary Geunbae Lee. Probabilistic information retrieval model for a dependency structured indexing system Information processing & management 41.2 (2005): 161-175.
  4. Stefan Riezler, Alexander Vasserman, Ioannis Tsochantaridis, Vibhu Mittal, Yi Liu (2007), Statistical Machine Translation for Query Expansion in Answer Retrieval, ACL 2007
  5. Mintz, Mike, et al. (2009) Distant supervision for relation extraction without labeled data. ACL 2009.
  6. Delphine Bernhard (2010), Query Expansion based on Pseudo Relevance Feedback from Definition Clusters, Coling 2010
  7. Zhang, Y., Zhong, V., Chen, D., Angeli, G., and Manning, C. D. (2017). Position-aware attention and supervised data improve slot filling., EMNLP 2017
  8. Ji, Guoliang, et al. (2017), Distant supervision for relation extraction with sentence-level attention and entity descriptions. AAAI 2017.
  9. Bhaskar Mitra, Fernando Diaz, and Nick Craswell. (2017), Learning to Match using Local and Distributed Representations of Text for Web Search. WWW '17.
  10. Guozheng Rao, Weihang Huang, Zhiyong Feng, Qiong Cong (2018) LSTM with sentence representations for document-level sentiment classification, Neurocomputing
  11. George Brokos, Polyvios Liosis, Ryan McDonald, Dimitris Pappas, and Ion Androutsopoulos (2018), AUEB at BioASQ 6: Document and Snippet Retrieval, BioASQ 2018
  12. MacAvaney, Sean and Yates, Andrew and Cohan, Arman and Goharian, Nazli (2019), CEDR: Contextualized Embeddings for Document Ranking, SIGIR'19
  13. Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, and Jungang Xu (2018) NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval EMNLP 2018
  14. Tapas Nayak, Hwee Tou Ng (2020) Effective Modeling of Encoder-Decoder Architecture for Joint Entity and Relation Extraction, AAAI
  15. Jibril Frej, Didier Schwab, Jean-Pierre Chevallet, (2020) WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset LREC-2020
  16. Lanza et. al., (2020), Towards Automatic Thesaurus Construction and Enrichment, LREC 2020
  17. Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, Jamie Callan (2021), Complementing Lexical Retrieval with Semantic Residual Embedding, ECIR 2021
  18. Zhepei Wei, Yantao Jia, Yuan Tian, Mohammad Javad Hosseini, Mark Steedman, Yi Chang (2021), Joint Extraction of Entities and Relations with a Hierarchical Multi-task Tagging Model, ECIR 2021
  19. Anu Shrestha, Francesca Spezzano (2021), Textual Characteristics of News Title and Body to Detect Fake News: A Reproducibility Study, ECIR 2021
  20. Alberto Purpura, Karolina Buchner, Gianmaria Silvello, Gian Antonio Susto (2021), Neural Feature Selection for Learning to Rank
  21. Anastasia Taranova, Martin Braschle (2021), Textual Complexity as an Indicator of Document Relevance ECIR 2021

Projects

This is a group activity. You can work on your own topic or choose a topic from our IR project list of topics. You are strongly encouraged to develop your own ideas. The project topics have to be confirmed by the course instructor. You need to write a project proposal of at most two pages and submit to the student portal by the 8th of May. For the project report, you can choose between the two possible project deadlines, the 1st of June, or the 14the of August. By one of these deadlines, you need to upload your project reports (max three pages) to the student portal. The second deadline is mainly intended for you who have failed by the first deadline and need to resubmit your report. However, you can also use the second deadline if you have not submitted the report earlier. The next possibility for resubmission in case of failure is the next time the course is given.