Nazli Goharian

Introduction to Information Retrieval

Success Story:

Developed the course by NSF support at Information Retrieval Lab & evaluated for 3 consecutive years! Taught the course since 2001 (17 times). The course is regularly updated. Students finishing this course successfully have found jobs at Microsoft, Google, Yahoo, Facebook, Amazon, ,…etc..

Course Description:

Information retrieval is the identification of textual components, be them web pages, blogs, microblogs, documents, medical transcriptions, mobile data, or other big data elements, relevant to the needs of the user.  Relevancy is determined either as a global absolute or within a given context or view point. Practical, but yet theoretically grounded, foundational and advanced algorithms needed to identify such relevant components are taught. The Information-retrieval techniques and theory, covering both effectiveness and run-time performance of information-retrieval systems are covered. The focus is on algorithms and heuristics used to find textual components relevant to the user request and to find them fast. The course covers the architecture and components of the search engines such as parser, index builder, and query processor. In doing this, various retrieval models, relevance ranking, evaluation methodologies, and efficiency considerations will be covered. The students learn the  material by building a prototype of such a search engine.  These approaches are in daily use by all search and social media companies.

Prerequisite:

Data Structure and comfortable Programming knowledge.

Recommanded Texts:

Handouts:

The course handouts will be available on the class Forum for most topics that are covered in the class.

Grading & Due Dates (Tentative- Will be finalized by the 1st day of the class!):

Project 40% 3-4 projects (Incrementally building the engine). Any programming language may be used (your choice). The projects require design and implementation of various components of a search engine per the assignment requirements, performing experimentations, and analysis. Deliverables for each project part include (detail will be specified when project is given): Cover Page,Design document,Software,Results & Analysis,and [potentially] Demo. Projects are individual (solo) tasks.
Research Presentation 8% Students must attend all presentations. A pool of papers will be made available and students will be asked to pick several choices, one of which will be finalized for student to present in the class. Will be announced if this assignment is an individual / solo or a group assignment!
Exams ( 2-3 exams ) 52%

Course Outline (Tentative!):

Slides
Introduction, Overview of IR
IR Utilities: Parser/Tokenizer, phrase Recognition, Stemming, N-Grams
Efficiency: Indexing
IR Models: Boolean, Vector Space Model; Similarity Measures
IR Models: Probablistic Model
IR Models: Language Model & Topic Model
Relational Approach
IR Evaluation
IR Utility: Passage Based Retrieval
IR Utility: Relevance Feedback and other Query Expansions
Efficiency : Compression
Efficiency: Top Docs, Query Threshold
Text Clustering
Web Search Ranking
Word Embedding
Neural Information Retrrieval
Intro to Text Classification
Web Personalization and Recommender Systems
Research Paper Presentations Students Presentations

Late Assignment Policy:

Will be posted on the syllabus that will be given to the class by the 1st day of the class.

Academic Integrity:

Visit the Honor System Website at http://gervaseprograms.georgetown.edu/honor/