Categories
research

Using Topological Data Analysis in Social Science Research: Unpacking Decisions and Opportunities for a New Method

Authors: Allison Godwin, Aaron Robert Hamilton Thielmeyer, Jacqueline Rohde, Dina Verdín, Brianna Shani Benedict, Rachel Ann Baker & Jacqueline Doyle

Abstract: This research paper describes a new statistical method for engineering education, Topological Data Analysis (TDA), and considers the important decisions made during analysis and their impact on the quality of the results. We also describe why this new method may provide novel ways of understanding multidimensional data for student attitudes, beliefs, and mindsets.

TDA is a statistical method that can map structure within highly dimensional, noisy, and incomplete data. It is also insensitive to the particular distance function chosen to detect the persistent structure or typology in the data. In some ways, TDA is like a robust cluster analysis. However, unlike cluster analysis, which attempts to break datasets into distinct (or probabilistic) groups, TDA allows for data with progressions rather than clear distinctions. Rather than being focused on breaking data into defined groups, TDA maps the connections among data and provides additional details within the data structure that cannot be captured using cluster analysis. Since its development in 2009, TDA has been used in a number of different fields including medicine, business, and sports. However, few studies have used this technique with social science data. We believe that this technique can be particularly useful to engineering education researchers who deal with complex data that is often multidimensional, noisy, and incomplete.

In this paper, we discuss the considerations that researchers must understand in conducting TDA with engineering education data. In analysis, a researcher must choose a filtering method, number of nearest neighbors (k), number of filter slices (n), overlap in data, and cut height (ε) for each dimension. The importance and effect on the consistency and quality of the data differs for each decision. Some have a large impact on the results of the analysis [e.g., cut height (ε)], while others have a moderate impact on the resulting map appearance but not key structural features identified [e.g., number of filter slices (n)].

We illustrate these methodological decisions as well as the results of TDA and its usefulness for engineering education using data from a project investigating first-year engineering students’ underling attitudes, beliefs, and mindsets to characterize the latent diversity of these students. A paper-and-pencil survey was administered to 3,855 students at 32 ABET accredited institutions across the U.S. in fall 2017. After cleaning the data using attention checks within the survey, a total of 3,711 student responses were examined for validity evidence. Exploratory factor analysis (for newly developed scales) and confirmatory factor analysis (for existing scales) was conducted. The resulting factors with strong validity evidence and high variability among engineering students were used in the TDA to map students’ latent diversity. The results of this map indicate six distinct data progressions as well as a sparse group of students whose responses were not similar to the majority of the dataset. This work illustrates the opportunities for using TDA and provides a discussion of the different researcher decisions that are involved in this statistical technique.

Read online for free at ASEE PEER: Using Topological Data Analysis in Social Science Research: Unpacking Decisions and Opportunities for a New Method

Tags: