I worked on project titled Audio Visual Synthesis. Problem statement is "For a given speech signal we have to generate the corresponding lip movements”. To achieve this, I used a phonetically rich audio-visual database containing over 9000 sentences spoken by 4 subject. In this work I chose LSTM-RNN model for predicting the lip shape, as RNN are capable of learning long-term dependencies. Based on the speech input lip shapes were predicted & a short video of a Talking head was generated .
Pre-Processing of the Video and Audio was done Matlab Futher Bi-directional LSTM RNN model was implemented using Keras Library.