Skip to content

A fashion AI-based model capable of generating images from textual descriptions. The model should take natural language text as input and generate images that visually represent the given text. This text-to-image generation system bridges the gap between textual descriptions and visual content.

Notifications You must be signed in to change notification settings

pthmhatre/StyleScribe-Using-Generative-Adversarial-Network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

StyleScribe-Using-Generative-Adversarial-Network

StyleScribe is a sophisticated text-to-image generation project that leverages state-of-the-art techniques in Natural Language Processing (NLP) and Generative Adversarial Networks (GANs). The primary objective of this project is to generate high-quality images of fashion products based on textual descriptions. By bridging the semantic gap between text and visual content, StyleScribe offers an innovative solution for the fashion industry, enabling the creation of visual content from descriptive text inputs.

Key Features

Text-to-Image Generator: StyleScribe uses advanced GAN models to convert textual descriptions into realistic images of fashion products. Large Dataset: The project utilizes a meticulously curated dataset of 60,000 unique fashion product images and their corresponding descriptions to train the model.
NLP Integration: Extensive preprocessing of textual data ensures semantic coherence and relevance, enhancing the quality of generated images.
Custom GAN Architecture: A tailored GAN model, comprising a generator and discriminator, is designed to produce high-fidelity fashion images.
Scalable Infrastructure: The project is hosted on Google Cloud RDP and uses Flask for the web interface and Firebase for real-time database management.

System Architecture

Frontend:
● Developed using Flask, the frontend has intuitive web interface for users to input text descriptions and view generated images.
Backend:
● The backend is powered by a Flask server, which handles incoming requests and communicates with the model server.
Model Server:
● Hosted on Google Cloud RDP, the model server processes text inputs through the GAN model and generates images.
Database:
● Firebase Firestore/Realtime Database stores metadata and text descriptions, ensuring real-time data synchronization.
Storage:
● Google Cloud Storage or Firebase Storage is used to store the training dataset and generated images.

Proposed Algorithm

The success of the project hinges on the development and implementation of an advanced algorithm that seamlessly bridges textual descriptions of fashion concepts with visually stunning design outputs. The proposed algorithm, underpinned by deep learning techniques and state-of-the-art technology, outlines the key steps and components that drive the transformative power of the platform.
  1. Textual Input Interpretation: The algorithm processes user input, extracting relevant information using natural language processing (NLP). Whether it’s a simple phrase or a detailed description, the NLP techniques identify color, style, length, and design elements.
  2. Neural Network for Text-to-Image Conversion:
    ● Generator: This component creates an initial image representation based on the textual input. It translates extracted details (e.g., color, shape) into a visual concept.
    ● Discriminator: Evaluates the quality and realism of generated images by comparing them to a real fashion image dataset. The feedback loop refines the generator’s output over time.
  3. Training the AI Model: The neural network trains on a diverse fashion image dataset, enabling it to generate a wide range of design concepts.
  4. Feedback Loop and Improvement: As users interact with the platform, the model learns from successes and mistakes, continuously improving its designs.
  5. User Customization: Users can further customize designs, specifying color choices, patterns, and fabric textures.
  6. Output Generation: The algorithm produces high-resolution images reflecting the user’s fashion concept.

Use-Case Diagrams

Actors Description
User Users act as creators by providing textual descriptions of their fashion concepts, customizing generated designs, and offering feedback. They drive the creative process, shaping the AI-generated fashaion designs to align with their vision.
Administrator Administrator can make changes to the application by modifying or adding new models.An extension of the administrator’s role is that he can also Add new features if required.

   

Analysis Modeling

ER Diagram

The ER Diagram outlines the backend process of an application, including input processing and activity flow. When the initial screen appears, functions are executed based on user actions. The input prompt plays a crucial role in this execution.

Activity Diagram

The Activity Diagram outlines the stages in the application process. It begins with opening the application and focusing on the input field for manual text entry. Users can then interact with the displayed image, either closing the application or using additional features based on text prompts.

Functional Modeling

DFD:Level 0

In DFD the user queries for information. The Administrator has extra privileges like changing the function and model. The system then creates the desired output and displays it on screen.

DFD:Level 1

In DFD level 1 the user uses his smartphone or tablet to open the application andgive the text input using text field on UI. The input then is used to first understand the meaning of the input text and extract the features. Once the features are extracted the image generation process starts and the image is generated pixel by pixel.

DFD:Level 2

In DFD level 2 the user give the input to NLP algorithm and through that we get the extracted features and those are used to refer to the image dataset and labled images which lables are used to refere to the meaning and the features user expecting to include in output. Image generator is used to create variations of the image user expecting as a output image by Features Matching and Precise image selector Function

Dataset

The project utilizes a meticulously curated dataset of 60,000 unique fashion product images and their corresponding descriptions to train the model.This images are lebelled with multiple attributes which provides its description in the form of metadata.This Metadata is embedded into term frequency-inverse document frequency.

Metadata

Image data

Results and UI Design

About

A fashion AI-based model capable of generating images from textual descriptions. The model should take natural language text as input and generate images that visually represent the given text. This text-to-image generation system bridges the gap between textual descriptions and visual content.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published