ECE 50836 - Introduction to Data Mining

Course Details

Lecture Hours: 3 Credits: 3

Areas of Specialization:

  • Computer Engineering

Counts as:

  • EE Elective
  • CMPE Selective

Normally Offered:

Each Fall


On-campus and online

Requisites by Topic:

Linear algebra, statistics, and Python programming

Catalog Description:

This course introduces fundamental techniques in data mining, i.e., the techniques that extract useful knowledge from a large amount of data. Topics include data preprocessing, exploratory data analysis, association rule mining, clustering, classification, anomaly detection, recommendation and graph analysis. Students are expected to gain the skills to formulate data mining problems, solve the problems using data mining techniques and interpret the output.

Required Text(s):

  1. Introduction to Data Mining , Second Edition , Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar , Morgan Kaufmann Publishers , 2011 , ISBN No. 9780133128901

Recommended Text(s):

  1. Data Mining: Concepts and Techniques , Third Edition , Jiawei Han, Micheline Kamber, and Jian Pei , Morgan Kaufmann Publishers , 2011 , ISBN No. 9780123814791

Learning Outcomes

  • Describe and explain the process of data mining
  • Formulate problems in real world applications into data mining tasks and solve the problems using data mining techniques
  • Implement software programs that conduct data mining and evaluate the output of data mining programs
  • Present data mining solutions to people in scientific and other disciplines

Lecture Outline:

1 Topic
1 Background and introduction
2 Data: Types of data, data quality, data preprocessing, measure of similarity and dissimilarity, data exploration and visualization
3 Association analysis: Frequent itemset generation, rule generation, compact representation of frequent itemsets and evaluation
4 Association analysis on special data types: Relational data, sequences and graphs
5 Clustering: K-means, hierarchical clustering, spectral clustering and density-based clustering
6 Clustering on special data types: Text clustering, subspace clustering and clustering spatial-temporal data
7 Classification: Decision tree, rule-based classifier, nearest-neighbor classifier, support vector machines, and ensemble methods
8 Classification on special data types: Classification in a network setting and sequence labeling
9 Anomaly detection: Statistical, distance-based, density-based and clustering-based approaches
10 Recommendation: Collaborative filtering, matrix factorization and applications
11 Graph analysis: Node ranking, link prediction and graph embedding

Assessment Method:

Quizzes, programming assignments, group project, and exams. (3/2022)