Purdue University has long been a leader in programming languages and compilers, but its leadership role is sometimes overlooked given that the research is divided between the Computer Science and Electrical and Computer Engineering (ECE) schools. To better showcase its PL skills and improve collaboration, the University created an umbrella group called the Purdue University Programming Languages Group (PurPL). The group unites the leading minds in programming research from both schools, spanning research into PL theory, design, and implementation, as well as language-based security, compiler optimizations, verification, and program synthesis.
Purdue is unique in that its ECE school has a lot of people who do computationally focused research that might normally be found in the School of Computer Science, explains Milind Kulkarni, associate professor of electrical and computer engineering and one of PurPL's leading researchers. Purdue has more people working in the PL and compiler space than just about any other university, but they’re spread between two schools. PurPL lets us combine our strengths.
PurPL’s researchers are grappling with PL challenges posed by the need to optimize complex software to run more efficiently on a profusion of new hardware technologies. Kulkarni boils it down to a single question: “How can we optimize performance while ensuring correctness?”
“As software has became more complicated, we needed a higher level of programming,” says Kulkarni. “Programmers are optimizing increasingly complex software to run more efficiently on complex hardware. Yet, this makes it harder for us to reason about the code and know whether it’s behaving correctly.
“Writing at a higher level of abstraction makes it easier for others to understand, but once a program has been made to run efficiently on a big distributed system with different kinds of hardware and accelerators, it looks nothing like it does at the high level. That’s why one big thrust at PurPL is adding verification for better guarantees.”
These challenges are exacerbated by the trend toward moving datacenter intelligence, such as AI algorithms, to the edge. “If I take the same machine-learning algorithm for image recognition that I run in the datacenter and I want to run it effectively on my phone, I have to consider specialized hardware including GPUs and different coprocessors,” says Kulkarni.
Much of the research under PurPL’s aegis involves balancing the tradeoffs of coding optimization vs. correctness. Here we focus on two projects: The first, which is spearheaded by Kulkarni, concerns the optimization of irregular applications with locality transformations. The other focuses on deductive synthesis of correct-by-construction programs, which involves improving proofs for programs written by other programs, known as generative programming.
Locality Transformations for Irregular Applications
One of Kulkarni’s principal interests in recent years has been optimizing compilers for increasingly complex software. His research into locality transformations is designed for automatically optimizing irregular applications.
“Much of the work in optimizing compilers is focused on High Performance Computing applications running big simulations, such as simulating the way a building shakes in an earthquake,” explains Kulkarni. “These problems tend to be very structured with predictable computational and usage patterns.”
Despite their “regularity,” such HPC applications are syntactically complex with huge amounts of data. To optimize them, software engineers have learned to carefully schedule their computations.
“If you have billions of operations happening on the same data, you want to schedule them so they happen in close succession,” says Kulkarni. “If you try to do operation A on each piece of data and then go back and do operation B, it’s too slow because you’re only looking at one small piece at a time. It’s faster to do multiple operations on one part of the grid and then move to the next. This loop tiling capability makes it easier to operate in parallel. There’s also a concept called loop fusion where you merge multiple loops together to make the process even faster.”
Loop-based techniques, however, break down when working with more unpredictable, “irregular” applications, which often involve highly recursive code. Considering that most applications are irregular, this is a major problem.
“There have always been irregular applications, but now there’s more interest in increasing performance on these applications and pushing the limits of computational hardware,” says Kulkarni. “In a machine-learning application, for example, you might want to find which data points are closer. If more points are green, the application calls it green, and if more are red, then it’s red. But taking a new data point and comparing it with every other data point can be slow when you’re talking about millions of data points.”
One approach to addressing this problem is to organize points into a tree structure. “Tree processing makes it much more efficient to figure out what region of the data I should be looking at to reduce the scope of the operation,” says Kulkarni. “Say I want to find the nearest person to me who shares my name. I could ask every person in the country and then find who is closest, but it might be faster if I asked everyone in Indiana, and then in other states. Algorithms that use tree structure techniques with this ‘look close first’ approach are very irregular because if you’re using it in California, you would want to look in California first.”
Applying traditional loop tiling and loop fusion techniques to such an irregular problem is inefficient. Instead, Kulkarni created a tool called Treefusor designed for irregular applications. “Treefusor enables multiple operations to run over the tree structure simultaneously,” he says. “It’s the analog of loop fusion. My other work in traversal splicing lets you merge operations on one region of the tree. You can then structure computations that need to work in this part of the tree and schedule them. This makes it easier for multiple tasks to work on a part of the tree at the same time.”
Better Approaches for Program Synthesis
High-level languages can reduce coding with the help of abstractions, but they are still operational in that they dictate what to do at every step. The continuing pressure for improved performance has led to interest in generative programming: creating programs that write other programs.
Several faculty associated with PurPL are working on programming techniques that backstop such program synthesis with “deductive proofs of correctness,” says Kulkarni. “They provide you with guarantees that the code is solving the problem. If you have a representation of what the problem is going to be, you can guarantee that the program you generate will do what it’s supposed to do.”
One approach builds upon the Coq programming language, which offers strong mathematical guarantees that can generate proofs. “All programmers translate high-level programming into low-level code,” explains Kulkarni. “At each step, you replace slow high-level code with a more efficient low-level version. When you do a Coq proof, you know that each step is correct.”
The problem with Coq is that it’s relatively difficult to use, and “it’s not meant to be used for high-performance code,” says Kulkarni. One PurPL research project led by Benjamin Delaware, assistant professor of computer science, in collaboration with John Wiegley, formerly of BAE Systems, aims to make Coq easier to use with the popular, higher-end Haskell language. The project focuses on making it easier to synthesize programs that use abstract data types, finding efficient implementations that match the way a program uses a data structure. The work also supports the extraction of stateful code using heap-manipulating operations.
Another approach for improving synthesis led by Xiaokang Qiu, assistant professor of electrical and computer engineering, is described in a paper titled “Natural Synthesis of Provably-Correct Data-Structure Manipulations” that was published in the October 2017 edition of Proceedings of the ACM on Programming Languages. “To synthesize a program, we typically search through the space of possible programs to try to prove that this potential synthetic program does the same thing as the original program,” says Kulkarni.
Because this is such a slow process, however, it can only be used to search for relatively simple programs.“The magic lies in how to do the search effectively,” says Kulkarni. “Qiu developed an efficient search process that restricts the search space in clever ways for complex programs. The technique focuses on generating versions of the program where it is relatively easy to determine whether they do the same thing. As a result, you can now do the proofs to generate more complex synthesized programs.”
These are only some of the projects underway at PurPL. Others include the Quelea project, which makes it easier to write distributed programs that share data. There’s also a project that looks at how to use machine learning to repair programs and another that aims to generate more efficient code for database engines.
“Every programmer in every part of the software world is running into the problem of how to make sure their code is right, their code is fast, and their code is easy to maintain,” says Kulkarni. “PurPL researchers are working on making their lives easier.