Data Science for Smart Cities

The availability of low cost and ubiquitous sensors in city infrastructure provides high granular data at unprecedented spatio-temporal scales. Smart Cities envision to utilize this data to provide a resilient and sustainable urban ecosystem by integrating the information and communication technology (ICT), Internet of things (IoT) and citizen participation to effectively manage and utilize city's infrastructure and services. Data Science provides fast and efficient ways to analyze heterogeneous data to understand the current dynamics of cities and ways to improve different services. This course will introduce scientific techniques that will allow the analysis, inference and prediction of large scale temporal data (e.g. GPS vehicular data, social media data, mobile phone data, individual social network data etc.) that are present in city networks. A special focus will be on data driven methods for problems that have a network structure. The course will focus both on the methods and their application to smart-city problems. Python will be used to demonstrate the application of each method on real world datasets available to the instructor. Examples of problems that will be discussed in class include: ridesharing platforms, smart and energy efficient buildings, evacuation modeling, decision making during extreme events & urban resilience.

CE56401

Credit Hours:

3

Learning Objective:

A student completing this course is expected to be able to:

  1. Understand the different types of data generated by smart cities
  2. Understand the basics of various data mining techniques
  3. Understand what type of data mining analysis is appropriate for various smart city applications
  4. Gain basic knowledge of using Python for data analytics and results visualization
  5. Apply the methods and techniques learned in this course to an applied smart city project

Description:

The availability of low cost and ubiquitous sensors in city infrastructure provides high granular data at unprecedented spatio-temporal scales. Smart Cities envision to utilize this data to provide a resilient and sustainable urban ecosystem by integrating the information and communication technology (ICT), Internet of things (IoT) and citizen participation to effectively manage and utilize city's infrastructure and services. Data Science provides fast and efficient ways to analyze heterogeneous data to understand the current dynamics of cities and ways to improve different services. This course will introduce scientific techniques that will allow the analysis, inference and prediction of large scale temporal data (e.g. GPS vehicular data, social media data, mobile phone data, individual social network data etc.) that are present in city networks. A special focus will be on data driven methods for problems that have a network structure. The course will focus both on the methods and their application to smart-city problems. Python will be used to demonstrate the application of each method on real world datasets available to the instructor. Examples of problems that will be discussed in class include: ridesharing platforms, smart and energy efficient buildings, evacuation modeling, decision making during extreme events & urban resilience.

Fall 2023 Syllabus

Topics Covered:

Main topics:

  • Introduction to data mining for smart cities
  • Data pre-processing and task identification
  • Introduction to Python for data mining
  • Supervised/unsupervised machine learning approaches
  • Understanding and interpretation of the results
  • Mining of massive datasets-Parallelization
  • Introduction to network science and mining social network graphs
  • Applications to Ride sharing platforms, extreme events modeling and urban resilience

Prerequisites

Undergraduate calculus and basic knowledge of statistical analysis

Homework:

Problem sets will be given, and the analysis of these assignments will be the basis for some class discussion. Problem sets are due at the beginning of class on designated days. For the problem sets, you may (are encouraged to) discuss with other students but the final written solution should be your own work. The exam will be open class notes. 

Projects:

Students are expected to work in groups of 2 students on a problem related to data science application or algorithm implementation.

Exams:

There will be one in-class exam in which you will be tested on readings and materials/discussions covered in class.

Textbooks:

Required readings will be posted in a Box folder that will be shared with students.

Computer Requirements:

Python (interpreted high-level programming language for general-purpose programming) and its libraries for data mining (numpy, scipy, matplotlib, etc.). Available for free download at https://www.python.org/. Details will be provided in class.

ProEd Minimum Requirements:

view