Speech Recognition Supported by Lip Analysis

Waqqas ur Rehman Butt
2016 ELCVIA Electronic Letters on Computer Vision and Image Analysis  
Computers have become more pervasive than ever with a wide range of devices and multiple ways of interaction. Traditional ways of human computer interaction using keyboards, mice and display monitors are being replaced by more natural modes such as speech, touch, and gesture. The continuous progress of technology brings to an irreversible change of paradigms of interaction between human and machine. They are now used in daily life in many devices that have revolutionized the way users interact
more » ... ith machines. In fact new PCs, tablets and smartphones are moving increasingly toward a direction that will bring in a short time to have interaction paradigms so advanced that will be completely transparent to users. The various modes of human-machine interaction, through voice recognition are without doubt one of the most considered. A number of researchers have revealed that a speech reading system is beneficial complement to an audio speech recognition system by using of visual cues of the speakers, such as face in noisy environment. However, robust and precise extraction of visual features is a challenging problem in object recognition, due to high variation in pose, lighting and facial makeup. Most of the existing approaches use constraints such as the use of reflective marker on subjects lips, lip movements recorded with a fixed camera position (head mounted camera) and lip segmentation in organized illumination conditions. Furthermore, there is no common consensus about the visual features selection and their significance for a particular phoneme. Speech is the natural procedure of communication. Therefore speech would be an apparently preferred option for human computer interaction. In the past years, development in technology, combined with a significant reduction in cost, has led to the pervasive use of automated speech recognition in variety of systems such as telephony, human-computer interaction and robotics. Visual speech cues are prospective source of speech information and they are apparently not affected in noisy acoustic environmental condition and cross talking between speakers. Visual information of a speaker is the key component of Speech Recognition system such as outside area of mouth, mouth gestures and facial expressions. The major problem to develop robust speech recognition system is to find the precise visual feature extraction method. Sometime hearer observes improper from speaker because of the incompatible effect of visual features. These visual features have great role in the lip reading process. These interpretations gave a motivation for developing a computer speech recognition system.
doi:10.5565/rev/elcvia.953 fatcat:umvzzkkd6ferld4yweorr5s3z4