Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 651 Bytes

README.md

File metadata and controls

10 lines (9 loc) · 651 Bytes

Visual_Synthesis


I worked on project titled Audio Visual Synthesis. Problem statement is "For a given speech signal we have to generate the corresponding lip movements”. To achieve this, I used a phonetically rich audio-visual database containing over 9000 sentences spoken by 4 subject. In this work I chose LSTM-RNN model for predicting the lip shape, as RNN are capable of learning long-term dependencies. Based on the speech input lip shapes were predicted & a short video of a Talking head was generated .

Pre-Processing of the Video and Audio was done Matlab Futher Bi-directional LSTM RNN model was implemented using Keras Library.