CPSC/AMTH 445/545 - Introduction to Data Mining - Fall 2017 Yale
Yale University CPSC 445/545 - F2016

CPSC/AMTH 445/545

Introduction to Data Mining

Fall 2017

Instructor: Guy Wolf (guy.wolf@yale.edu)

TA: Jay Stanley (jay.stanley@yale.edu)
ULAs: Tyler Dohrn (tyler.dohrn@yale.edu) & Scott Stankey (scott.stankey@yale.edu)

The ability to process and extract insightful information from large amounts of data has become a desired, if not necessary, skill in almost every field of industry and science. Among other benefits, such information can provide useful knowledge, support decision-making, uncover hidden trends, and enable deeper understanding of observed phenomena. This course will cover some of the main problems and challenges encountered in data analysis and applications, and provide fundamental tools and techniques for solving them. We will discuss popular algorithms for data organization & visualization, such as principal component analysis (PCA) and multidimensional scaling (MDS). Students will become familiar with a variety of machine learning and data mining approaches. These will include both supervised approaches, such as performing classification (e.g., with decision trees, Bayesian classifiers, and SVM), and unsupervised ones, such as clustering data (e.g., with k-means, density estimators, and linkage-based agglomeration).

The lectures and discussions in class will be accompanied by homework exercises that combine theoretical questions, which emphasize the understanding of underlying data mining principles, together with programming tasks (e.g., in MatLab and/or Python) that demonstrate practical implementations of studied data mining techniques. Grades in this course will be based on these exercises, a project, and an exam.

The course assumes basic prior knowledge in probabilities, linear algebra, data structures, algorithms, and programming.



Tuesdays & Thursdays 1:00-2:15


Wednesdays 6:00 PM, AKW 307

Office Hours:

Instructor: Fridays 4:00 PM - 6:00 PM, AKW 103
TA: Wednesdays 10:00 AM - 12:00 PM, AKW 307, or by appointment
ULA: Mondays 4:00 PM - 6:00 PM, HLH17 zoo annex


No required textbook, but the following books are recommended for the course:


This is a tentative list of topics we intend to cover, which may change as we progress through the course:


Extra topics (slides not prepared specifically for this course):

Final grade composition:

The final grade in this class will be based on three components:

In-class exam:

Group projects:


Notice that the top four grades will be used when computing the final grade in the course, so you can skip at most one exercise during the semester.