Tentative Schedule of Topics¶
Specific reading assignments will be announced in class, on the Canvas site, or via these webpages. Please make sure that you stay current with the assigned work, and check the Canvas site regularly for updates. All chapters (and sections within those chapters) refer to the 2nd edition of the textbook.
- Week 1: Preliminaries; Probability and elementary statistics; Data structures and algorithms basics. Introduction to Data Mining (Chapter 1); Working with Jupyter notebooks.
- Weeks 2, 3 and 4: Detecting Similarity (Chapter 3); Content-based and collaborative filtering (Chapter 9.1 – 9.3). Introduction to the Python ecosystem for scientific and numerical data processing and exploratory data analysis: pandas and matplotlib (or bokeh). Start of project 1.
- Week 5: Streaming algorithms (Chapter 4.1 - 4.5); Probabilistic Sketches (Notes).
- Weeks 6 and 7: Frequent itemsets and association rules (Chapter 6); Introduction to scikit-learn.
- Week 8: Project 1 presentations; Clustering (Chapter 7). Start of project 2.
- Week 9: Clustering contd. (rest of Chapter 7).
- Weeks 10 and 11: A comprehensive review of linear algebra basics; Link Analysis (Chapter 5.1 - 5.3; 5.5); Dimension reduction (Chapter 11)
- Weeks 12 and 13: Classical machine learning including regression, support-vector machines and decision trees (Chapter 12).
- Week 14: Review; Project 2 presentations.