Intro to Data Mining
This course introduces fundamental techniques in data mining, i.e., the techniques that extract useful knowledge from a large amount of data. Topics include data preprocessing, exploratory data analysis, association rule mining, clustering, classification, anomaly detection, recommendation and graph analysis. Students are expected to gain the skills to formulate data mining problems, solve the problems using data mining techniques and interpret the output.
ECE50836
Credit Hours:
3Description:
This course introduces fundamental techniques in data mining, i.e., the techniques that extract useful knowledge from a large amount of data. Topics include data preprocessing, exploratory data analysis, association rule mining, clustering, classification, anomaly detection, recommendation and graph analysis. Students are expected to gain the skills to formulate data mining problems, solve the problems using data mining techniques and interpret the output.
Topics:
- Background and introduction
- Data: Types of data, data quality, data preprocessing, measure of similarity and dissimilarity, data exploration and visualization
- Association analysis: Frequent itemset generation, rule generation, compact representation of frequent itemsets and evaluation
- Association analysis on special data types: Relational data, sequences and graphs
- Clustering: K-means, hierarchical clustering, spectral clustering and density-based clustering
- Clustering on special data types: Text clustering, subspace clustering and clustering spatial-temporal data
- Classification: Decision tree, rule-based classifier, nearest-neighbor classifier, support vector machines, and ensemble methods
- Classification on special data types: Classification in a network setting and sequence labeling
- Anomaly detection: Statistical, distance-based, density-based and clustering-based approaches
- Recommendation: Collaborative filtering, matrix factorization and applications
- Graph analysis: Node ranking, link prediction and graph embedding
Topics Covered:
Computer EngineeringPrerequisites:
Linear algebra, statistics, and Python programmingApplied / Theory:
50 / 50Homework:
Quizzes and 4 programming assignmentsProjects:
One projectExams:
Final examTextbooks:
Required:- Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar, Second Edition, Pearson, 2019, ISBN: 9780133128901.
- Data Mining: Concepts and Techniques, 3rd ed. Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann Publishers, 2011, ISBN: 9780123814791.
- o Click the link above to access this book in full-text online from the Purdue Libraries.