CommentCrafter

https://cc-web-client-737754814579.us-central1.run.app/

CommentCrafter

Full-Stack LLM-Powered Training Data Generation Platform

The idea is to use an LLM to generate test/training data for comment filtering on e-commerce sites. Not only do customers wish to see relevant product reviews as part of a good user experience, but there is a present danger in unfiltered comment sections as bad actors can use the platform to communicate and cooperate on illicit activities (e.g. drug dealing, crime organizing). Using an LLM to score each comment is expensive and training data on a newfound website is absent. The purpose of this tool is to provide bulk LLM-generated training data necessary for classification models and other supervised machine learning methods — which are much cheaper for comment filtering than an LLM itself.

After providing the web client with a URL to an online product, the service will produce and provide the requested amount of mock User Generated Content (UGC) coupled with a relevancy and offensiveness scores. Users will also be able to view past queries and export the data for relevant use.

Tech Stack

Python
Flask
Firebase Firestore
Firebase Authentication
Next.js
TypeScript
HTML/CSS
Docker
Google Cloud Run
Vertex AI (Gemini API)

Motivation

With information-rich sites like Reddit beginning to update their robots.txt files to block out AI crawlers and most search engines from accessing its content (unless they pay up), the data needed to train AI models is becoming more unavailable and less democratized. This leaves companies who already have scraped the data (OpenAI, Big Tech) with an even more monopolistic role in the AI market as young and small companies have less data to train on -- not a healthy thing for competition.
When browsing through the site for a Youtube channel's clothing line, I saw people using the product comment section to conduct what seemed to be a drug deal. Regardless, it was completely irrelevant to the item listed which isn't healthy for the the integrity of the site and their revenue rates. Without a comment filterer, the site is vulnerable to hosting illicit activities while also suffering worse purchase rates, as customers rely on insightful reviews. A week later, the site had no choice but to remove the comment sections.

Without an adequate comment filterer, small businesses that don't have access to large datasets either suffer the lower purchase rates that come with not having a user comment/review section or find themselves vulnerable to potentially offensive/irrelevant/illicit comments.

Service

CommentCrafter aims to provide bulk training data needed for training comment filterers. Comment filteres are supervised classification models, which need detailed data for proper training. For the sake of simplicity, classification models run in an efficient O(n) time while putting each comment itself through an LLM resembles that of O(n^2) time.

Given an proper link input to an e-commerce product, amount of comments you want to generate, and the pollution level (proportion of "bad" data), CommentCrafter uses an LLM to understand the nature of the product being sold and then returns the synthetically generated training data accordingly. This data can then be exported into json, csv, or xml and is archived if the user account wishes to refer to it again.

Home Page

Home Page: Pre-login

Firebase Auth Login through Google Account

Home page: logged in, input fields filled and ready for generation

Generation Results Page - product information and generated comments

Generation Results Page - export data feature

Generation History Page - list of all past products and their generations

Product Record Page - Export all past generated data feature

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
cc-api-service		cc-api-service
cc-web-client		cc-web-client
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

https://cc-web-client-737754814579.us-central1.run.app/

CommentCrafter

Full-Stack LLM-Powered Training Data Generation Platform

Tech Stack

Motivation

Service

Home Page

About

Releases

Packages

Languages

IvanHornung/CommentCrafter

Folders and files

Latest commit

History

Repository files navigation

https://cc-web-client-737754814579.us-central1.run.app/

CommentCrafter

Full-Stack LLM-Powered Training Data Generation Platform

Tech Stack

Motivation

Service

Home Page

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages