ARify: Leveraging Narrated Instructional Videos to Create Augmented Reality Tutorials for Procedural Tasks

by | Apr 13, 2026

Authors: Xiyun Hu, Chenfei Zhu, Shao-Kang Hsia, Dizhi Ma, Rahul Jain, Karthik Ramani
In Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
https://doi.org/10.1145/3772318.3790715

Augmented Reality (AR) tutorials enhance procedural task learning by providing situated, step-by-step guidance. Yet, creating such tutorials requires AR authoring expertise, posing a significant entry barrier. To lower this barrier, we introduce ARify, an authoring system that semi-automatically transforms narrated instructional videos into AR tutorials. To guide system design, we conducted a content analysis of video tutorials and derived a design space of instructional intents, tactics, and AR representations. Building on this, ARify generates AR tutorials by integrating a vision–language model to plan tutorial structures and an AR builder to configure AR representations, and offers interfaces that allow users to refine and customize the results. A numerical study on three machine tasks and a user study with 18 participants showed that ARify achieves promising performance across task types, and allows novices to author effective AR tutorials, validating its effectiveness and usability.

Xiyun Hu

Xiyun Hu

PhD student in Mechanical Engineering, Robotics area. Focusing on human-computer iteration in AR/VR/MR/XR and mechatronics.