2023 Research Projects

Projects are posted below; new projects will continue to be posted. To learn more about the type of research conducted by undergraduates, view the archived symposium booklets and search the past SURF projects.

This is a list of research projects that may have opportunities for undergraduate students. Please note that it is not a complete list of every SURF project. Undergraduates will discover other projects when talking directly to Purdue faculty.

You can browse all the projects on the list or view only projects in the following categories:


Cybersecurity (4)

 

Data Free Model Extraction 

Description:
*** Desired experience: Strong coding skills and motivation in research are required. Background in deep learning, security, and natural language processing is not required but a plus.

*** Possible industry involvement: Some of these projects are funded by Meta/Facebook research awards and J.P.Morgan AI research awards. 

*** We especially encourage applications from women, Aboriginal peoples, and other groups underrepresented in computing.

*** Project 1. Data-Free Model Extraction

Many deployed machine learning models such as ChatGPT and Codex are accessible via a pay-per-query system. It is profitable for an adversary to steal these models for either theft or reconnaissance. Recent model-extraction attacks on Machine Learning as a Service (MLaaS) systems have moved towards data-free approaches, showing the feasibility of stealing models trained with difficult-to-access data. However, these attacks are ineffective or limited due to the low accuracy of extracted models and the high number of queries to the models under attack. The high query cost makes such techniques infeasible for online MLaaS systems that charge per query.

In this project, we will design novel approaches to get higher accuracy and
query efficiency than prior data-free model extraction techniques.

Early work and background can be found here: 
https://www.cs.purdue.edu/homes/lintan/publications/disguide-aaai23.pdf

*** Project 2. Language Models for Detecting and Fixing Software Bugs and Vulnerabilities

In this project, we will develop machine learning approaches including code language models to automatically learn bug and vulnerability patterns and fix patterns from historical data to detect and fix software bugs and security vulnerabilities. We will also study and compare general code language models and domain-specific language models.

Early work and background can be found here: 
Impact of Code Language Models on Automated Program Repair. ICSE 2023. Forthcoming.
KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair. ICSE 2023. Forthcoming.
https://www.cs.purdue.edu/homes/lintan/publications/cure-icse21.pdf
https://www.cs.purdue.edu/homes/lintan/publications/deeplearn-tse18.pdf

*** Project 3. Inferring Specifications from Software Text for Finding Bugs and Vulnerabilities

A fundamental challenge of detecting or preventing software bugs and vulnerabilities is to know programmers’ intentions, formally called specifications. If we know the specification of a program (e.g., where a lock is needed, what input a deep learning model expects, etc.), a bug detection tool can check if the code matches the specification. 

Building upon our expertise on being the first to extract specifications from code comments to automatically detect software bugs and bad comments, in this project, we will analyze various new sources of software textual information (such as API documents and StackOverflow Posts) to extract specifications for bug detection. For example, the API documents of deep learning libraries such as TensorFlow and PyTorch contain a lot of input constraint information about tensors. Language models may be explored.

Early work and background can be found here: 
https://www.cs.purdue.edu/homes/lintan/projects.html

*** Project 4. Testing Deep Learning Systems  



We will build cool and novel techniques to make deep learning code such as TensorFlow and PyTorch reliable and secure. We will build it on top of our award-winning paper (ACM SIGSOFT Distinguished Paper Award)! 



Machine learning systems including deep learning (DL) systems demand reliability and security. DL systems consist of two key components: (1) models and algorithms that perform complex mathematical calculations, and (2) software that implements the algorithms and models. Here software includes DL infrastructure code (e.g., code that performs core neural network computations) and the application code (e.g., code that loads model weights). Thus, for the entire DL system to be reliable and secure, both the software implementation and models/algorithms must be reliable and secure. If software fails to faithfully implement a model (e.g., due to a bug in the software), the output from the software can be wrong even if the model is correct, and vice versa.  



This project aims to use novel approaches including differential testing to detect and localize bugs in DL software (including code and data) to address the testing oracle challenge. 



Early work and background can be found here: 
https://www.cs.purdue.edu/homes/lintan/publications/eagle-icse22.pdf
https://www.cs.purdue.edu/homes/lintan/publications/fairness-neurips21.pdf
https://www.cs.purdue.edu/homes/lintan/publications/variance-ase20.pdf
https://www.cs.purdue.edu/homes/lintan/publications/cradle-icse19.pdf

Research categories:
Big Data/Machine Learning, Cybersecurity, Deep Learning, Other
Preferred major(s):
  • Computer Science
  • Computer Engineering
  • software engineering
School/Dept.:
https://www.cs.purdue.edu/homes/lintan/
Professor:
Lin Tan

More information: https://www.cs.purdue.edu/homes/lintan/

 

Finding cybersecurity vulnerabilities in IoT/embedded systems 

Description:
Embedded systems provide control and operational intelligence for high-value cyber-physical systems such as smart cars, smart tractors, and smart city components. These systems must be secured from adversarial interactions. Vulnerabilities in embedded systems primarily occur in the external-facing components, especially in networking protocol stacks. One vulnerability detection technique that is widely used for IT software, such as web services, is called dynamic analysis (“fuzz testing”). We believe fuzzing will also find vulnerabilities in embedded systems. However, there are many challenges in adapting fuzzers to embedded systems software.

This project will develop new techniques to enable dynamic security analysis of embedded systems. The student will express research ideas in computer software, especially C/C++/Python code. The student will conduct experiments to identify and analyze discovered security vulnerabilities.
Research categories:
Cybersecurity, Internet of Things (IoT)
Preferred major(s):
  • No Major Restriction
Desired experience:
Strong C/C++ programming skills, Python, familiarity with Linux programming environment (e.g. you are comfortable on the terminal), some knowledge of cybersecurity exploits (e.g. buffer overflows). Knowledge of embedded systems context is a plus. Successful applicants are likely EE, CompEng, or CS majors.
School/Dept.:
Electrical & Computer Engineering
Professor:
James Davis
 

Promoting secure software supply chains with Sigstore 

Description:
Many software products rely on components developed by other teams or companies. These components are known as the software supply chain. It is important that these components be trustworthy. The Sigstore project aims to help engineers guarantee that the components on which they rely were really produced by organizations that they trust. Sigstore is a fast-growing open-source project but we need help innovating its feature set and promoting its use.
Research categories:
Cybersecurity
Preferred major(s):
  • No Major Restriction
Desired experience:
Strong programming skills, familiarity with Linux programming environment (e.g. you are comfortable on the terminal), some knowledge of cybersecurity exploits (e.g. buffer overflows). Knowledge of web systems a plus. Successful applicants are likely EE, CompEng, or CS majors.
School/Dept.:
Electrical & Computer Engineering
Professor:
James Davis

More information: https://www.sigstore.dev/

 

Trustworthy Re-use of Pre-Trained Neural Networks 

Description:
Deep neural networks (DNNs) are widely used, from image recognition in autonomous vehicles to detecting anomalies in system logs. Training these networks incurs a huge carbon footprint. Reusing pre-trained neural networks (PTNNs) reduces this cost and improves engineering efficiency. However, little attention has yet been paid to improving the software engineering infrastructure that supports the trustworthiness of PTNNs. At present, PTNNs are shared across industry via model hubs: collections of PTNNs organized by problem domain and machine learning framework. These zoos imitate traditional software registries, such as NPM and Maven, whereby engineers share software packages. PTNNs are still in their infancy, and there are many unknowns regarding their trustworthy exchange between engineering teams.

Undergraduate student(s) will work with graduate students on projects related to analyzing PTNNs, developing tools to standardize them (e.g. ONNX), and developing tools to measure them.
Research categories:
Cybersecurity, Deep Learning
Preferred major(s):
  • No Major Restriction
Desired experience:
Successful applicants should have most of the following: Introductory coursework or equivalent experience in machine learning and deep learning. Strong programming skills, familiarity with Linux programming environment (e.g. you are comfortable on the terminal). Vague knowledge of cybersecurity (e.g. buffer overflows). Knowledge of web systems (you know what React and Flask are, you've used one of them before). Data analysis skills (e.g. with Pandas). Successful applicants are likely EE, CompEng, or CS majors.
School/Dept.:
Electrical & Computer Engineering
Professor:
James Davis

More information: https://davisjam.github.io