First-Person View Hand Segmentation of Multi-Modal Hand Activity Video Dataset

by Sangpil Kim | Aug 7, 2020

Authors: Sangpil Kim, Hyung-gun Chi, Xiao Hu, Anirudh Vegesana, Karthik Ramani

In proceedings of the 31st British Machine Vision Conference (BMVC)

https://www.bmvc2020-conference.com/conference/papers/paper_0570.html

Paper

Video

Abstract: First-person-view videos of hands interacting with tools are widely used in the computer vision industry. However, creating a dataset with pixel-wise segmentation of hands is challenging since most videos are captured with fingertips occluded by the hand dorsum and grasped tools. Current methods often rely on manually segmenting hands to create annotations, which is inefficient and costly. To relieve this challenge, we create a method that utilizes thermal information of hands for efficient pixel-wise hand segmentation to create a multi-modal activity video dataset. Our method is not affected by fingertip and joint occlusions and does not require hand pose ground truth. We show our method to be 24 times faster than the traditional polygon labeling method while maintaining high quality. With the segmentation method, we propose a multi-modal hand activity video dataset with 790 sequences and 401,765 frames of “hands using tools” videos captured by thermal and RGB-D cameras with hand segmentation data. We analyze multiple models for hand segmentation performance and benchmark four segmentation networks. We show that our multi-modal dataset with fusing LWIR and RGB-D frames achieves 5% better hand IoU performance than using RGB frames.

Learn more about BMVC: https://bmvc2020.github.io/

*Kim, Chi, and Ramani are members of the Convergence Design Lab, Purdue University, West Lafayette
**Hu and Vegesana are students in Electrical and Computer Engineering, Purdue University, West Lafayette

Sangpil Kim

Sangpil Kim is a Ph.D. student in the School of Computer Engineering at Purdue University. He is working on the deep learning algorithm and virtual reality. To be more specific, he develops the generative model, video segmentation, and hand pose estimation with a depth sensor. Currently, he is working on combining virtual reality and deep learning algorithm.