2021-12-02 15:00:00 2021-12-02 16:00:00 America/Indiana/Indianapolis FAZT: A framework to learn from little or no data - few-shot and zero-shot learning for tempo-visual events Naveen Madapana, Ph.D. Candidate https://purdue-edu.zoom.us/j/7475961144?pwd=RTBSRkRTUVBvb0x3UkNhY3M1cU5OQT09 Meeting ID: 747 596 1144 Passcode: zeroshot

December 2, 2021

FAZT: A framework to learn from little or no data - few-shot and zero-shot learning for tempo-visual events

Event Date: December 2, 2021
Sponsor: Dr. Juan Wachs
Time: 3:00 pm EST
Location: https://purdue-edu.zoom.us/j/7475961144?pwd=RTBSRkRTUVBvb0x3UkNhY3M1cU5OQT09

Meeting ID: 747 596 1144
Passcode: zeroshot
Priority: No
School or Program: Industrial Engineering
College Calendar: Show
Naveen Madapana, Ph.D. Candidate
Naveen Madapana, Ph.D. Candidate

 

ABSTRACT

 
Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Hence, the immediate application to the domains with never seen observations is limited. Further, these approaches lack the ability to rapidly generalize from a few examples or from high-level descriptions of categories. Hence, there exists a significant gap in the way humans represent categories (activities or actions or events) in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes a general framework to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL).
 
To tackle the challenges associated with ZSL and learning semantic attributes, this work introduces an approach based on recurrent neural networks, referred to as Joint Sequential Semantic Encoder (JSSE), to explore temporal patterns, and to simultaneously optimize both the semantic and classification losses. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition; however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2. Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-set scenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions.