An Application to Convert Lip Movement into Readable Text
International Journal of Engineering Research and
Lip-Reading is the process of interpreting spoken word by observing lip movement, without or with audio (Noisy). The input for the application will be frames from an entire video. The application needs to detect the face (lips) from the entire frame also, trace and understand the feature patterns of the lip moments over the parameter of time. This can be done using Computer Vision (feature extraction) and Deep Convolutional Neural Network (CNN) Model Lipreading system is difficult to implement
... icult to implement due to complex image processing, hard-to-train classifiers and long-term recognition processes. Automatic lip-reading technology is a very important component of human-computer interaction technology. It is very important for human language communication and visual perception. Traditional lip-reading systems usually consist of two stages: feature extraction and classification. For the first stage, a lot of methods use pixel values extracted from the mouth region of interest (ROI) represented as visual information. At present, deep learning has made significant progress in the field of computer vision (image representation, target detection, human behaviour recognition and video recognition). Therefore, automatic lipreading technology has shifted from the traditional manual feature extraction classification methods to end-to-end deep learning architecture models.