Resources¶
Primary textbook¶
Mining of Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffrey D. Ullman, 3nd edition. You can also download the book’s contents free of charge from the website for the book courtesy of the publisher, but may find it useful to buy a hardcopy for reference.
Please note that through this site, you also have access to all the videos of lectures from the Stanford MOOC course based on this book. I may assign some of these videos for required viewing so that in class, we may concentrate on solving specific problems or understanding analyses more in-depth.
Other Resources¶
For coding purposes, we will exclusively use Python 3.8+ as our programming language, along with libraries and tools like:
Matplotlib (or other plotting libraries like Bokeh)
Many of these libraries can be installed locally on your laptops by using a distribution like Anaconda - please install any Python 3.8+ compatible Anaconda distribution on your individual laptop and check for the software above.
Google also makes available a browser-based notebook interface called Colaboratory (or Colab) that can be used in lieu of local software. Colab notebooks are Jupyter notebooks that run in the cloud and are integrated with Google Drive.
We will use either the Google Cloud Platform (GCP) or Amazon Web Services (AWS) for class project work. Once access has been properly set up, links to appropriate labs and modules (called Qwiklabs) will be provided a few weeks into the semester.
This class is supported by DataCamp, a learning platform for data science and analytics. DataCamp offers 350+ short courses by expert instructors on topics such as importing data, data visualization, and machine learning. They are constantly expanding their curriculum to keep up with the latest technology trends and to provide the best learning experience for all skill levels. Students in class will have free access to all DataCamp courses for the entire semester, and will be required to complete relevant, selected courses as part of assigned, for-credit homework.