Deep Learning Models with Sparsity Constraints: Applications to Chemical Process Systems

Interdisciplinary Areas: Data and Engineering Applications

Project Description

Deep learning models can be used as surrogate models in chemical process systems applications, representing various aspects of these complex systems, such as chemical process flowsheets, computational fluid dynamics (CFD) models, and more. However, these models often struggle with interpretability and overfitting, partly due to the high number of parameters involved. To address this issue, this project proposes using sparsity constraints in the deep learning models. By incorporating such constraints, we can effectively reduce the number of parameters, and increase the interpretability of the models. Training a sparse deep learning model involves various approaches. One such technique is the lottery ticket hypothesis, which proposes that sparse, initialized subnetworks (i.e., "winning tickets") can be trained to match the performance of the original network. Alternatively, one could identify key properties of the training process itself to identify favorable inflection points in the accuracy vs. scale tradeoff that can lead to better representations being learnt which could be further useful in downstream tasks such as transferability to new datasets. These properties include initialization schemes, training algorithms, and activation functions of neurons. In addition, sparsity could also stem from the use of knowledge distillation techniques, that take the outputs of a trained teacher model and use the posterior probability outputs to train a more parsimonious high-fidelity student model. In addition to sparsifying on the parameter side, a related problem is that of identifying key data points or training coresets that effectively reduce the training data size while still maintaining fidelity to the original model. Furthermore, integer programming, a method of mathematical optimization, can be used to introduce hard constraints on the number of active neurons in a deep learning model, hence enforcing sparsity. The project aims to develop novel theory-guided methods for enforcing sparsity constraints in deep learning models. During the course of the project, the ideal candidate is expected to understand and build upon the existing body of research on theoretical as well as (in collaboration with graduate students) practical sparse optimization techniques for neural network training. This research will contribute to expanding the applications of artificial intelligence in chemical engineering, forging new pathways for process modeling and optimization.


Start Date

Anytime after February 2024


Postdoc Qualifications

PhD in applied math, computer science, industrial engineering, chemical engineering, electrical engineering, mechanical engineering, or related fields.
Fluent programming in one of the following programming languages: Python/Julia/C++.



Advisor 1:
Name: Can Li
Affiliation: Davidson School of Chemical Engineering

Advisor 2:
Name:Rajiv Ashu Khanna
Affiliation:Department of Computer Science


Short Bibliography

Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635.
Dedieu, A., Hazimeh, H., & Mazumder, R. (2021). Learning sparse classifiers: Continuous and mixed integer optimization perspectives. The Journal of Machine Learning Research, 22(1), 6008-6054.
Jianping Gou, Baosheng Yu, Stephen John Maybank, Dacheng Tao: Knowledge Distillation: A Survey. CoRR abs/2006.05525 (2020)
Hubbs, C. D., Li, C., Sahinidis, N. V., Grossmann, I. E., & Wassick, J. M. (2020). A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 141, 106982.
Bayesian Coresets: An Optimization Perspective Yibo Zhang, Rajiv Khanna, Anastasios Kyrillidis, Oluwasanmi Koyejo. AISTATS 2021
Adversarially-trained deep nets transfer better Francisco Utrera, Evan Kravitz, N Benjamin Erichson, Rajiv Khanna, Michael W Mahoney ICLR 2021