Multimodal Image Perception System for Blind or Visually Impaired People

Currently there is no suitable substitute technology to enable blind or visually impaired (BVI) people to interpret visual scientific data commonly generated during lab experimentation in real time, such as performing light microscopy, spectrometry, and observing chemical reactions. This reliance upon visual interpretation of scientific data certainly impedes students and scientists that are BVI from advancing in careers in medicine, biology, chemistry, and other scientific fields. To address this challenge, a real-time multimodal image perception system is developed to transform standard laboratory blood smear images for persons with BVI to perceive, employing a combination of auditory, haptic, and vibro-tactile feedbacks. These sensory feedbacks are used to convey visual information through alternative perceptual channels, thus creating a palette of multimodal, sensorial information. 

I. Introduction

From the 2011 National Health Interview Survey (NHIS) Preliminary Report, it is estimated that 21.2 million adult Americans, namely more than 10% of all adult Americans have trouble seeing. Among the  6.6 million working-age adults with BVI, 64% did not finish high school and approximately only 6% earned a Bachelor’s or higher degree [1]. The lack of proper and effective assistive technologies (AT) can be considered as a major roadblock for individuals that are BVI to actively participate in science and advanced research activities [2]. It is still a challenge for them to perceive and understand scientific visual data acquired during wet lab experimentation, such as viewing live specimens through a stereo microscope or histological samples through light microscopy (LM). According to Science and Engineering Indicator 2014 published by NSF, no more than 1% of blind or visually impaired people are involved in advanced science and engineering research and receive doctoral degrees [3].

By using current single-modality human-computer interfaces (HCI), only limited visual information can be accessed due to different limitations of each sense. Although, tactile-vision sensory substitution (TVSS) technologies, such as Tongue electrotactile array [4], and tactile pictures [5], have been demonstrated capable of conveying visual information [6] of spatial phenomenology [7], the low resolution of somatosensory display arrays have always been a limitation of these methods to convey complex image information. Auditory-vision sensory substitution has also been studied in image perception [8], [9]. Trained early blind participants showed increased performance in localization and object recognition [10] through this substitution. Auditory-vision substitution always involves the memorization of different audio forms and training is required to map from different audio stimulus to visual cues. In addition, the focus on auditory feedback can decrease subjects’ ability to get information from the environment [11]. The current gap is that existing solutions cannot help conveying the richness, complexity and amount of data available to users without disabilities. In this study, a real-time multimodal image perception approach is investigated that incorporates the feedback to multiple sensory channels, including auditory, haptics and vibrotactile. Through the integration of multiple sensorial substitutions, participants supported using the proposed platform showed higher analytic performance than when using the standard interface based on one sensory feedback only.