Assistive Vision Technology using Deep Learning Techniques

Dr. Neeta Verma
2021 International Journal for Research in Applied Science and Engineering Technology  
One of the most important functions of the human visual system is automatic captioning. Caption generation is one of the more interesting and focused areas of AI, with numerous challenges to overcome. If there is an application that automatically captions the scenes in which a person is present and converts the caption into a clear message, people will benefit from it in a variety of ways. In this, we offer a deep learning model that detects things or features in images automatically, produces
more » ... escriptions for the images, and transforms the descriptions to audio for louder readout. The model uses pre-trained CNN and LSTM models to perform the task of extracting objects or features to get the captions. In our model, first task is to detect objects within the image using pre trained Mobilenet model of CNN (Convolutional Neural Networks) and therefore the other is to caption the pictures based on the detected objects by using LSTM (Long Short Term Memory) and convert caption into speech to read out louder to the person by using SpeechSynthesisUtterance interface of the Web Speech API. The interface of the model is developed using NodeJS as a backend for the web page. Caption generation entails a number of complex steps, including selecting the dataset, training the model, validating the model, creating pre-trained models to check the images, detecting the images, and finally generating captions.
doi:10.22214/ijraset.2021.36815 fatcat:rgrjpywcivcmpfwhq5gnyhobay