This is an open-source tool for collecting AI conversation datasets to fine-tune Large Language Models (LLMs) easily and effectively. Currently a work in progress.
-
Create Projects: Kick things off by creating a project where you can configure the settings and system prompts for your project.
-
Link with Langfuse: Connect your projects to Langfuse, an open-source LLM engineering platform that will be used to store conversation/annotation data and user interactions.
-
Get Contributors: Generate a shareable link for your project for human experts to have mock conversations and annotate AI responses, building you a tailored, high-quality dataset to help train your model!
In this demo, I use the LLM Data Engine to collect data for training a model to behave as a helpful elementary school tutor.
- I create a project and link it to Langfuse.
- For demonstration, I act as an annotator, using the shareable link to simulate a conversation as a student and annotate/refine the AI responses.
- I go to Langfuse to see the collected dataset, which can be used to fine-tune my model for my tutoring use case!
LLM.Data.Engine.Demo.mp4
If you have ideas, find bugs, or want to help build something the Data Engine, don’t hesitate to open an issue, submit a pull request, or directly reach out!
This project uses TypeScript, Next.js, Prisma/PostgreSQL, NextAuth, and Tailwind CSS, along with the OpenAI and Langfuse API's.