Skip to content

This project aids visually impaired individuals by converting images to text using CNNs for analysis and LSTM models for description. Text-to-speech technology then vocalizes the information, enabling a comprehensive understanding of surroundings through audio.

Notifications You must be signed in to change notification settings

KesavaSravan/Seeing-with-Your-Ears-An-AI-powered-Image-to-Speech-Solution-for-the-Visually-Challenged

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Certainly! This project aims to develop a system that can help people who are visually impaired to understand their surroundings by converting images into text and then into voice commands. The system is based on two types of deep learning models: Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models. The CNNs are used to analyze and process the images, while the LSTM models are used to generate the corresponding text based on the visual information. Once the image is converted to text, the system uses text-to-speech technology to convert the text into spoken words.

Dataset Details: https://www.kaggle.com/datasets/adityajn105/flickr8k

Consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. … The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations

About

This project aids visually impaired individuals by converting images to text using CNNs for analysis and LSTM models for description. Text-to-speech technology then vocalizes the information, enabling a comprehensive understanding of surroundings through audio.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages