Preliminary Exam Seminar: Kat Nykiel
|Event Date:||December 9, 2022|
|Location:||DLRC 143 B or WebEx
|School or Program:||Materials Engineering
“Semi-Supervised Machine Learning Methods and their Applications to Materials Informatics”
Kat Nykiel, MSE PhD Candidate
Advisor: Professor Alejandro Strachan
Machine learning is a powerful tool with the potential to circumvent complex physical problems by mapping directly from inputs to outputs. In datasets where labels are abundant, supervised models are trained on the labeled data and used to infer labels for unlabeled samples. When only unlabeled samples are present, unsupervised methods attempt to identify interesting patterns in the data. However, when only a small portion of the data is labeled, a new set of tools emerges collectively known as semi-supervised learning. These methods use context from unlabeled data to better inform the model, typically with higher accuracy than pure supervised learning. The trend of large, unlabeled datasets with sparse labels is common to many problems in materials science, where acquiring new labels is often expensive. For example, semi-supervised learning is beneficial for predicting new materials, as this is a vast domain with a large amount of unlabeled data and few positive samples. However, the overlap of semi-supervised learning and materials science is limited. This paper seeks to review the assumptions and classes of semi-supervised learning methods and conduct a demonstration of how they might be applied to a materials problem. A self-training, pseudo-label approach is applied to a dataset of density-functional theory calculations to predict formation energy from a small amount of labeled data.