2023 Research Projects
Projects are posted below; new projects will continue to be posted. To learn more about the type of research conducted by undergraduates, view the archived symposium booklets and search the past SURF projects.
This is a list of research projects that may have opportunities for undergraduate students. Please note that it is not a complete list of every SURF project. Undergraduates will discover other projects when talking directly to Purdue faculty.
You can browse all the projects on the list or view only projects in the following categories:
Cybersecurity (4)
Data Free Model Extraction
*** Possible industry involvement: Some of these projects are funded by Meta/Facebook research awards and J.P.Morgan AI research awards. *** We especially encourage applications from women, Aboriginal peoples, and other groups underrepresented in computing.
*** Project 1. Data-Free Model Extraction
Many deployed machine learning models such as ChatGPT and Codex are accessible via a pay-per-query system. It is profitable for an adversary to steal these models for either theft or reconnaissance. Recent model-extraction attacks on Machine Learning as a Service (MLaaS) systems have moved towards data-free approaches, showing the feasibility of stealing models trained with difficult-to-access data. However, these attacks are ineffective or limited due to the low accuracy of extracted models and the high number of queries to the models under attack. The high query cost makes such techniques infeasible for online MLaaS systems that charge per query.
In this project, we will design novel approaches to get higher accuracy and
query efficiency than prior data-free model extraction techniques.
Early work and background can be found here:
https://www.cs.purdue.edu/homes/lintan/publications/disguide-aaai23.pdf
*** Project 2. Language Models for Detecting and Fixing Software Bugs and Vulnerabilities
In this project, we will develop machine learning approaches including code language models to automatically learn bug and vulnerability patterns and fix patterns from historical data to detect and fix software bugs and security vulnerabilities. We will also study and compare general code language models and domain-specific language models.
Early work and background can be found here:
Impact of Code Language Models on Automated Program Repair. ICSE 2023. Forthcoming.
KNOD: Domain Knowledge Distilled Tree Decoder for Automated Program Repair. ICSE 2023. Forthcoming.
https://www.cs.purdue.edu/homes/lintan/publications/cure-icse21.pdf
https://www.cs.purdue.edu/homes/lintan/publications/deeplearn-tse18.pdf
*** Project 3. Inferring Specifications from Software Text for Finding Bugs and Vulnerabilities
A fundamental challenge of detecting or preventing software bugs and vulnerabilities is to know programmers’ intentions, formally called specifications. If we know the specification of a program (e.g., where a lock is needed, what input a deep learning model expects, etc.), a bug detection tool can check if the code matches the specification.
Building upon our expertise on being the first to extract specifications from code comments to automatically detect software bugs and bad comments, in this project, we will analyze various new sources of software textual information (such as API documents and StackOverflow Posts) to extract specifications for bug detection. For example, the API documents of deep learning libraries such as TensorFlow and PyTorch contain a lot of input constraint information about tensors. Language models may be explored.
Early work and background can be found here:
https://www.cs.purdue.edu/homes/lintan/projects.html
*** Project 4. Testing Deep Learning Systems
We will build cool and novel techniques to make deep learning code such as TensorFlow and PyTorch reliable and secure. We will build it on top of our award-winning paper (ACM SIGSOFT Distinguished Paper Award)!
Machine learning systems including deep learning (DL) systems demand reliability and security. DL systems consist of two key components: (1) models and algorithms that perform complex mathematical calculations, and (2) software that implements the algorithms and models. Here software includes DL infrastructure code (e.g., code that performs core neural network computations) and the application code (e.g., code that loads model weights). Thus, for the entire DL system to be reliable and secure, both the software implementation and models/algorithms must be reliable and secure. If software fails to faithfully implement a model (e.g., due to a bug in the software), the output from the software can be wrong even if the model is correct, and vice versa.
This project aims to use novel approaches including differential testing to detect and localize bugs in DL software (including code and data) to address the testing oracle challenge.
Early work and background can be found here: https://www.cs.purdue.edu/homes/lintan/publications/eagle-icse22.pdf https://www.cs.purdue.edu/homes/lintan/publications/fairness-neurips21.pdf https://www.cs.purdue.edu/homes/lintan/publications/variance-ase20.pdf https://www.cs.purdue.edu/homes/lintan/publications/cradle-icse19.pdf
- Computer Science
- Computer Engineering
- software engineering
More information: https://www.cs.purdue.edu/homes/lintan/
Finding cybersecurity vulnerabilities in IoT/embedded systems
This project will develop new techniques to enable dynamic security analysis of embedded systems. The student will express research ideas in computer software, especially C/C++/Python code. The student will conduct experiments to identify and analyze discovered security vulnerabilities.
- No Major Restriction
Promoting secure software supply chains with Sigstore
- No Major Restriction
More information: https://www.sigstore.dev/
Trustworthy Re-use of Pre-Trained Neural Networks
Undergraduate student(s) will work with graduate students on projects related to analyzing PTNNs, developing tools to standardize them (e.g. ONNX), and developing tools to measure them.
- No Major Restriction
More information: https://davisjam.github.io