Doctoral Seminar on Large Scale Statistical Machine Learning
COSC-878
Department of Computer Science
Georgetown University

Go back
Course Description: This doctoral seminar studies topics in statistical machine learning in the big data era. ``Big data" is more than a buzzword. It presents challenges to data analysis methods from three perspecgives: bigger data volume, higher data complexity, and faster data change rate. In this seminar, we will focus on foundations and recent development in large scale online learning, reinforcement learning, and non-parametric clasification and clustering methods. In the class, we will read textbooks and survey milestone papers. Students are expected to submit quesitons for the readings before each class and give presentations when it is your turn. Term paper or term project is encouraged, but not required.
Prerequisites:
  • Doctoral students.
Time and Location:

Class: Tuesday 11-12:50. Location: STM 326.

Instructor: Grace Hui Yang
Google Group: https://groups.google.com/forum/#!forum/cosc878
Textbooks:

  • (Elements) Trever Hastie, Robert Tibshirani, Jerome Friendman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Text in in Statistics, Springer-Verlag, New York, 2001. download the pdf book
  • (NP) Larry Wasserman. All of nonparametric Statistics, Springer Texts in Statistics, Springer-Verlag, New York, 2005. pdf
  • (RL) Richard S. Suttton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. online version

Grading: Questions that you submit before each class 20%, Presentation 70%, Class participation 10%.
Guidelines:
  • Question submission: Please submit three questions for the readings in the next class to the class mailing list by 5pm the day before. This way the presenter for the next class will be able to address your questions and make his/her slides better.
  • Presentation:

    You should team up with a fellow student. These readings need two of you work together to produce the slides.

    Prepare a 90-min talk, and anticipate many questions during the talk.

    Your talk will be evaluated based on the 'depth' that you go to, clarity, the correctness of your math, and how well you answer peer students' questions. You will NOT be evaluated by the amount of presentation passion or skills. However, please do make sure we can hear you clearly.

    A simple guideline that I used to judge if the clarity of a talk is enough is that whether we can code the algorithm based on your presentation slides.

    As the presenter, it is your talk and you should take care of the pace. Feel free to address clarification questions and also feel free to delay the answer if you have a slide for it in the next a few minutes.

    You will need to include answers to the submitted questions in your slides.

    Last but not least, please remember to email to the class mailing list your updated slides by 11:59pm on your presentation day.

  • Class participation: Basically, the class is created for academic research. Therefore, we expect active, intellectural, heated academic discussions throughout the presentations. For many of the papers or the selected book chapters, none of us have read them before, which means we have equally little background about them. Therefore, don't be shy. Please shout your questions when you don't get what is going on.
Syllabus
Date Class Readings Presenter(s)Slides Topic
1. 1/13 Introduction & Reinforcement Learning Kaelbling, Littman, Moore. Reinforcement Learning: An Survey Grace slides RL
2. 1/20 Markov Decision Process Chp 3 and 4 of RL Yuankai, Tavish slides RL
3. 1/27 Monte Carlo Methods, TD-Learning Chp 5 and 6 of RL Yuankai, Tavish slides RL
4. 2/3 No class, WSDM
5. 2/10 Generalization (1) Chp 8 of RL and (2) Peshkin et al. Learning to Cooperate via Policy Search. UAI 2000. Brendan, Yifang slides RL
6. 2/17 Gradient Descent 9.1, 9.2 and 9.3 of Boyd, Stephen and Lieven Vandenburghe. Convex Optimization. Cambridge: Cambridge University Press, 2004. Brendan, Yifang slides Online learning
7. 2/24 Stochastic Gradient Descent (1) Le Cun, Leon Bottou Yann. Large Scale Online Learning. Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference. Vol. 16. MIT Press, 2004. and (2) Nemirovski, Arkadi, et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization 19.4 (2009): 1574-1609. Jiyun, Sicong slides Online learning
8. 3/3 Margin-based Methods Chp 12 of Elements Henry, Brad lda, svm Classification
3/10 No class. Spring Break
9. 3/17 Large margin classification Freund, Yoav, and Robert E. Schapire. "Large margin classification using the perceptron algorithm." Machine learning 37.3 (1999): 277-296 Henry, Sicong part I, part II Classification
10. 3/24 Regression Chp 5 of Elements Yuankai, Tavish slides Regression
11. 3/31 No class. ECIR.
12. 4/7 Kernel Chp 6 of Elements Henry, Brad part I, part II Kernel
13. 4/14 Non-parametric regression Chp 5 of NP Jiyun, Sicong slides Regression
14. 4/21 Clustering Handouts will be distributed before class Brendan, Yifang slides Clustering