Intro to Data Mining

This course introduces fundamental techniques in data mining, i.e., the techniques that extract useful knowledge from a large amount of data. Topics include data preprocessing, exploratory data analysis, association rule mining, clustering, classification, anomaly detection, recommendation and graph analysis. Students are expected to gain the skills to formulate data mining problems, solve the problems using data mining techniques and interpret the output.

ECE50836

Credit Hours:

3

Description:

This course introduces fundamental techniques in data mining, i.e., the techniques that extract useful knowledge from a large amount of data. Topics include data preprocessing, exploratory data analysis, association rule mining, clustering, classification, anomaly detection, recommendation and graph analysis. Students are expected to gain the skills to formulate data mining problems, solve the problems using data mining techniques and interpret the output.
Topics:

  • Background and introduction
  • Data: Types of data, data quality, data preprocessing, measure of similarity and dissimilarity, data exploration and visualization
  • Association analysis: Frequent itemset generation, rule generation, compact representation of frequent itemsets and evaluation
  • Association analysis on special data types: Relational data, sequences and graphs
  • Clustering: K-means, hierarchical clustering, spectral clustering and density-based clustering
  • Clustering on special data types: Text clustering, subspace clustering and clustering spatial-temporal data
  • Classification: Decision tree, rule-based classifier, nearest-neighbor classifier, support vector machines, and ensemble methods
  • Classification on special data types: Classification in a network setting and sequence labeling
  • Anomaly detection: Statistical, distance-based, density-based and clustering-based approaches
  • Recommendation: Collaborative filtering, matrix factorization and applications
  • Graph analysis: Node ranking, link prediction and graph embedding

Topics Covered:

Computer Engineering

Prerequisites:

Linear algebra, statistics, and Python programming

Applied / Theory:

50 / 50

Homework:

Quizzes and 4 programming assignments

Projects:

One project

Exams:

Final exam

Textbooks:

Required:
  • Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar, Second Edition, Pearson, 2019, ISBN: 9780133128901.
Recommended:
  • Data Mining: Concepts and Techniques, 3rd ed. Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann Publishers, 2011, ISBN: 9780123814791.
    • o Click the link above to access this book in full-text online from the Purdue Libraries.

ProEd Minimum Requirements:

view