Purdue University traffic research program cuts data analysis and batching from hours to minutes with BigQuery

About Purdue University JTRP

Purdue University’s Joint Transportation Research Program works with the Indiana Department of Transportation, higher education institutions, and industry to improve the planning, design, construction, management, and economic efficiency of Indiana’s transportation infrastructure.

Industries: Education, Government & Public Sector
Location: United States
Products: Google Cloud, BigQuery

Tell us your challenge. We're here to help.

Contact us

Using BigQuery, a Purdue University traffic research program analyzes massive connected and autonomous car datasets in just minutes.

With the availability of massive datasets from connected and autonomous vehicles (CAVs), researchers and traffic engineers have the opportunity to improve the safety, efficiency, maintenance, and planning of road and street systems. Purdue University’s Joint Transportation Research Program (JTRP) is using BigQuery and Google Cloud to convert vast amounts of CAV data into information that can be used to study, learn from, and inform innovation in transportation technology.

“JTRP’s mission is to help agencies do things better, cheaper, faster, and safer,” says JTRP Director Darcy Bullock. “Data is a big focus here because it touches on all of those things.” The CAV data can help governments and agencies that use public funds to make data-driven decisions on everything from traffic signal timings to large capital infrastructure investments.

By replacing on-premises servers and software with BigQuery, the JTRP research team can uncover findings faster and more precisely, even as datasets grow larger. The ease of calling up data analysis in the cloud allows the organization to turn big data into actionable information—for example, helping governments and agencies adjust traffic signal timing and deploy warnings to drivers.

Overcoming big data challenges

How can driving be made safer? Education and smarter cars can help—but there’s also the challenge of better managing the roads, intersections, and traffic signals that drivers have to navigate. There are more than 400,000 traffic signals in the United States, and the way they’re timed to turn green, yellow, or red has a significant impact on urban mobility and traffic safety. Likewise, there are over 800 fatalities in construction work zones annually in the United States, so it’s important for agencies to have fast and effective tools for evaluating their safety and operations.

For researchers like those at JTRP, measuring and analyzing traffic used to be a time- and labor-intensive task. “A decade ago, just measuring vehicle travel time required chaining a sensor to a speed limit sign or light pole,” says Howell Li, Senior Software Engineer for JTRP. “These systems were labor-intensive to install and maintain.”

The advent of connected and autonomous vehicles brought researchers a wealth of data—but with it, new challenges, such as how to cost-effectively store huge amounts of CAV data but allow easy access and how to perform analysis with such huge datasets.

“We used to use on-premises servers, but as the amounts of data we needed to store and work with increased, we quickly reached a point where it simply wasn’t feasible to continue this way,” says Li. “The servers became very expensive, and it wasn’t a scalable system.” JTRP works with data from 10 states besides Indiana and wanted to add an additional 160 billion records, but the storage and analytics challenges slowed down that goal.

In addition, performance of on-site servers was poor. “Even with the very fastest solid state drives, some of our batching operations could take a month to complete,” Li says. “The servers would always get maxed out before research paper submission deadlines.”

Bringing big data to researchers

As JTRP technology leaders researched alternatives to on-premises servers, they learned about BigQuery, which provided a solution for housing data, performing analysis, and generating reports—all in the cloud.

“BigQuery was the way to go, given its ability to ingest large volumes of data and perform analytics quickly,” Li says. The CAV data that’s used by JTRP researchers comes from Wejo, JTRP’s data vendor, and is easy to import to BigQuery. “One of the strengths of BigQuery is that everything is done with SQL—a language that we were already using and that everyone was already familiar with. So all we had to do was update our old on-premises database endpoints with new ones for BigQuery.”

BigQuery also offers native support for Apache Parquet files. “Rather than extracting, transforming, and reloading those files, we simply imported them into BigQuery, and they were ready to use immediately,” says Li.

“There’s no question that BigQuery is more cost-effective than buying servers. With $10 worth of queries, we can do what would cost $20,000 of engineering and labor in the field. This is a real game-changer for using CAV data to provide agencies with actionable information for improving the safety and efficiency of their transportation systems.”

Darcy Bullock, Director, Joint Transportation Research Program, Purdue University

Making data and analysis more accessible

Many of the research questions JTRP seeks to answer—such as the impact of traffic slowdowns—require continuous, near-real-time data collection and analysis. Using BigQuery, data loading and queries can run in parallel, resulting in dramatic improvements in efficiency and productivity compared with on-premises systems.

For example, a single query to examine a section of interstate highway over a 24-hour period takes just seven minutes on BigQuery, compared to 90 minutes with a single on-premises server system. Batching 10 billion records could take a month with the old servers; the process takes just five minutes with BigQuery.

“Just having approachable access to large datasets will open up a lot of possibilities for research,” Li says. “Weather, which produces huge amounts of data, is another data-intensive area that can tell us how to manage our roads better and safer, especially during inclement conditions.”

The affordability of BigQuery compared with servers also allows researchers to dream big when it comes to new projects.

“There’s no question that BigQuery is more cost-effective than buying servers,” Bullock says. “With $10 worth of queries, we can do what would cost $20,000 of engineering and labor in the field. This is a real game-changer for using CAV data to provide agencies with actionable information for improving the safety and efficiency of their transportation systems.”

“The ease of scaling up in cloud computing presents an opportunity across the public sector for managing roadways. Our vehicles know more about our infrastructure than we do, and now we can make use of that knowledge.”

Howell Li, Senior Software Engineer, Joint Transportation Research Program, Purdue University

Tell us your challenge. We're here to help.

Contact us

About Purdue University JTRP

Purdue University’s Joint Transportation Research Program works with the Indiana Department of Transportation, higher education institutions, and industry to improve the planning, design, construction, management, and economic efficiency of Indiana’s transportation infrastructure.

Industries: Education, Government & Public Sector
Location: United States