November 5, 2025

Purdue research earns Best Paper Award runner-up at top networking conference

A Purdue University research team led by Purdue ECE Prof. Christopher Brinton has earned the Best Paper Award Runner-Up at MobiHoc 2025.
A grid of four men smiling. Top left: outdoors with a yellow hoodie. Top right: suit and tie on a white background. Bottom left: glasses, scenic view. Bottom right: indoors, light blue shirt.
Clockwise from top left: Liangqi Yuan (first author), Dong-Jun Han, Christopher Brinton, and Shiqiang Wang

A Purdue University research team led by Christopher Brinton, Elmore Associate Professor in the Elmore Family School of Electrical and Computer Engineering, has earned the Best Paper Award Runner-Up at the ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc 2025).

Purdue ECE PhD student Liangqi Yuan is first author on the paper, titled Local-Cloud Inference Offloading for LLMs in Multi-Modal, Multi-Task, Multi-Dialogue Settings.” The recognition places the team’s work among the top three papers accepted to MobiHoc 2025, which is one of the premier international conferences in networking.

Large language models (LLMs), such as ChatGPT and similar systems, can handle a wide range of tasks, from writing and translation to analyzing images. But deploying these massive models efficiently is a growing challenge.

“Running LLMs entirely on local devices like smartphones or laptops can overwhelm their processing power, while running them in the cloud can introduce high costs and slow response times,” said Brinton. “Our work looks at how to balance those trade-offs intelligently.”

The team’s proposed system, called TMO (Three-M Offloading), decides when tasks should be handled locally and when they should be offloaded to the cloud. The “Three-M” in the name refers to the system’s ability to handle multi-modal data (like text and images), multi-task environments (different types of problems), and multi-dialogue interactions (ongoing conversations).

At the heart of TMO is a reinforcement learning algorithm, a type of machine learning that optimizes decisions through trial and error. In this case, it learns the best times to process data locally or in the cloud based on factors such as task complexity, response quality, latency, and cost.

Illustration of a man in a kitchen asking, "Where did I put my dishes?" An AI responds, locating them. Diagrams show AI processing via cloud and local systems.
Application scenario of the TMO system as an LLM Assistant. In this example, cloud LLM is selected along with first-person and overhead views, as the query requires visual information.

To support this system, the researchers created a new dataset called M4A1, which contains real-world measurements of how different AI models perform across various data types, tasks, and dialogue settings.

“Our approach achieved significant improvements in response quality, speed, and cost efficiency compared to other methods,” said Yuan. “It shows how AI systems can be both high-performing and resource-aware.”

The implications of this research go beyond the lab. Smarter local-cloud coordination could make advanced AI tools more responsive and accessible on everyday devices, which is an important step toward scalable, sustainable AI deployment.

“Our work bridges the gap between local and cloud AI computing,” Brinton said. “It’s a move toward more adaptive systems that can deliver top-tier performance while staying practical for real-world use.”

The research was supported in part by the National Science Foundation, the Office of Naval Research, and the Air Force Office of Scientific Research.