XPS: CLCCA: On the Hunt for Correctness and Performance Bugs in Large-scale Programs

Milind Kulkarni, Michael Gribskov, Saurabh Bagchi

Funded by National Science Foundation

Starting: August 2013

The scale of computing applications has been dramatically increasing over the past several years. As applications in domains such as computational genomics, data mining, and machine learning are let loose on ever-more-complex problems, the scale of the inputs to these applications has shot up. And as the pursuit of parallelism has led to increasing core counts for servers, and increasing numbers of servers and racks for data centers, the scale of the systems that these applications must run on has also dramatically risen. A critical problem in developing large scale applications is detecting and debugging scaling issues, which are problems with program behavior that emerge only as a program scales up. Scaling issues show up as correctness bugs or performance bottlenecks. Unfortunately, detecting bugs that arise at large scales is difficult. Manually poring through logs or performance profiling individual application processes is not practical. Moreover, the developer may not have access to the inputs and systems necessary to run the application at large scales. In this research project, we are developing automated techniques to detect and diagnose correctness and performance bugs for large-scale programs using program behavior modeling, training at small scale runs, and extrapolating to large-scale runs. Specifically, in this project, we are working with computational genomics applications, namely, Blast, Bowtie, Trinity/Butterfly, and Margin. 

To achieve our objectives, we build statistical models that incorporate scale. By relating program scale to program behavior, we can predict how a program behaves at large scales, without ever seeing correct behavior at that scale, and use those predictions to detect and diagnose bugs. The project is structured around three thrusts, each using the computational genomics applications for context. In the first, we build statistical models of program behavior that incorporate scale. In the second, we build statistical techniques for detecting when there is an error and then drilling down to identify potential root causes in the software. In the third, we build a testing tool which will allow us to uncover such scaling issues in an accelerated manner.  In aggregate, the project combines in innovative ways applications of static analysis, dynamic instrumentation, modeling, and machine learning-based data analysis.