Below is a list of topics that can be considered as a course project.
- Implement/extend a language specific IR system (integrate linguistic processing of the documents and queries: tokenizing, stop list, etc.).
- Implement a Boolean query converter that takes an information need in natural languages and converts it to a Boolean query
- A study of the impact of the document length on the result of information retrieval.
- A study of the impact of the lexical translation on the result of information retrieval.
- A survey on the use of word embeddings in information retrieval systems
- A survey on the use of syntactic parsing in information retrieval
- A survey on the use of neural networks in information retrieval
- Implement a document classifier based on Latent Semantic Analysis
- Compare different weighting schemes.
- Search engine for Wikipedia: implement vector space retrieval model. Base- line for evaluation: check against one freely chosen search engine results with site:wikipedia.org
- Relevance feedback project: implement and test methods for relevance feedback (from IIR book chapter 9), use relevance judgements of the Cranfield collection.
- IR system on a specific domain, e.g. IMDB.
- speech integration with IR.
- your own topic.
You are welcome to use datasets that were introduced during the course. You can also consider using the existing tools such as Indri or Lucene toolkit for indexing and searching in a large text corpus; using Stanford NLP parser or, NLTK or OpenNLP toolkits for text analysis; using MALLET or WEKA for classification or clustering.