COSC878 Doctoral Seminar on Large Scale Statistical Machine Learning

Doctoral Seminar on Large Scale Statistical Machine Learning
COSC-878
Department of Computer Science
Georgetown University

Course Description:

This doctoral seminar studies topics in statistical machine learning in the big data era. ``Big data" is more than a buzzword. It presents challenges to data analysis methods from three perspecgives: bigger data volume, higher data complexity, and faster data change rate. In this seminar, we will focus on foundations and recent development in large scale online learning, reinforcement learning, and non-parametric clasification and clustering methods. In the class, we will read textbooks and survey milestone papers. Students are expected to submit quesitons for the readings before each class and give presentations when it is your turn. Term paper or term project is encouraged, but not required.

Prerequisites:

Doctoral students.

Time and Location:

Class: Tuesday 11-12:50. Location: STM 326.

Instructor:

Grace Hui Yang

Google Group:

https://groups.google.com/forum/#!forum/cosc878

Textbooks:

(Elements) Trever Hastie, Robert Tibshirani, Jerome Friendman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Text in in Statistics, Springer-Verlag, New York, 2001. download the pdf book
(NP) Larry Wasserman. All of nonparametric Statistics, Springer Texts in Statistics, Springer-Verlag, New York, 2005. pdf
(RL) Richard S. Suttton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998. online version

Grading:

Questions that you submit before each class 20%, Presentation 70%, Class participation 10%.

Guidelines:

Question submission: Please submit three questions for the readings in the next class to the class mailing list by 5pm the day before. This way the presenter for the next class will be able to address your questions and make his/her slides better.
Presentation:
You should team up with a fellow student. These readings need two of you work together to produce the slides.

Prepare a 90-min talk, and anticipate many questions during the talk.

Your talk will be evaluated based on the 'depth' that you go to, clarity, the correctness of your math, and how well you answer peer students' questions. You will NOT be evaluated by the amount of presentation passion or skills. However, please do make sure we can hear you clearly.

A simple guideline that I used to judge if the clarity of a talk is enough is that whether we can code the algorithm based on your presentation slides.

As the presenter, it is your talk and you should take care of the pace. Feel free to address clarification questions and also feel free to delay the answer if you have a slide for it in the next a few minutes.

You will need to include answers to the submitted questions in your slides.

Last but not least, please remember to email to the class mailing list your updated slides by 11:59pm on your presentation day.
Class participation: Basically, the class is created for academic research. Therefore, we expect active, intellectural, heated academic discussions throughout the presentations. For many of the papers or the selected book chapters, none of us have read them before, which means we have equally little background about them. Therefore, don't be shy. Please shout your questions when you don't get what is going on.

Syllabus

	Date	Class	Readings	Presenter(s)	Slides	Topic
1.	1/13	Introduction & Reinforcement Learning	Kaelbling, Littman, Moore. Reinforcement Learning: An Survey	Grace	slides	RL
2.	1/20	Markov Decision Process	Chp 3 and 4 of RL	Yuankai, Tavish	slides	RL
3.	1/27	Monte Carlo Methods, TD-Learning	Chp 5 and 6 of RL	Yuankai, Tavish	slides	RL
4.	2/3	No class, WSDM
5.	2/10	Generalization	(1) Chp 8 of RL and (2) Peshkin et al. Learning to Cooperate via Policy Search. UAI 2000.	Brendan, Yifang	slides	RL
6.	2/17	Gradient Descent	9.1, 9.2 and 9.3 of Boyd, Stephen and Lieven Vandenburghe. Convex Optimization. Cambridge: Cambridge University Press, 2004.	Brendan, Yifang	slides	Online learning
7.	2/24	Stochastic Gradient Descent	(1) Le Cun, Leon Bottou Yann. Large Scale Online Learning. Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference. Vol. 16. MIT Press, 2004. and (2) Nemirovski, Arkadi, et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization 19.4 (2009): 1574-1609.	Jiyun, Sicong	slides	Online learning
8.	3/3	Margin-based Methods	Chp 12 of Elements	Henry, Brad	lda, svm	Classification
	3/10	No class. Spring Break
9.	3/17	Large margin classification	Freund, Yoav, and Robert E. Schapire. "Large margin classification using the perceptron algorithm." Machine learning 37.3 (1999): 277-296	Henry, Sicong	part I, part II	Classification
10.	3/24	Regression	Chp 5 of Elements	Yuankai, Tavish	slides	Regression
11.	3/31	No class. ECIR.
12.	4/7	Kernel	Chp 6 of Elements	Henry, Brad	part I, part II	Kernel
13.	4/14	Non-parametric regression	Chp 5 of NP	Jiyun, Sicong	slides	Regression
14.	4/21	Clustering	Handouts will be distributed before class	Brendan, Yifang	slides	Clustering

Doctoral Seminar on Large Scale Statistical Machine Learning COSC-878 Department of Computer Science Georgetown University

Doctoral Seminar on Large Scale Statistical Machine Learning
COSC-878
Department of Computer Science
Georgetown University