GitHub - KesavaSravan/Seeing-with-Your-Ears-An-AI-powered-Image-to-Speech-Solution-for-the-Visually-Challenged: This project aids visually impaired individuals by converting images to text using CNNs for analysis and LSTM models for description. Text-to-speech technology then vocalizes the information, enabling a comprehensive understanding of surroundings through audio.

Certainly! This project aims to develop a system that can help people who are visually impaired to understand their surroundings by converting images into text and then into voice commands. The system is based on two types of deep learning models: Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) models. The CNNs are used to analyze and process the images, while the LSTM models are used to generate the corresponding text based on the visual information. Once the image is converted to text, the system uses text-to-speech technology to convert the text into spoken words.

Dataset Details: https://www.kaggle.com/datasets/adityajn105/flickr8k

Consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. … The images were chosen from six different Flickr groups, and tend not to contain any well-known people or locations, but were manually selected to depict a variety of scenes and situations

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
deploy.py		deploy.py
model_plot.png		model_plot.png
testing.py		testing.py
tokenizer.p		tokenizer.p
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

KesavaSravan/Seeing-with-Your-Ears-An-AI-powered-Image-to-Speech-Solution-for-the-Visually-Challenged

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages