-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📥 feat: Import Conversations from LibreChat, ChatGPT, Chatbot UI #2355
📥 feat: Import Conversations from LibreChat, ChatGPT, Chatbot UI #2355
Conversation
Hey @DenisPalnitsky thanks for taking a look at this! This is a great feature I look forward to seeing myself. There is lots to do so I can't get to everything at once, so I appreciate you taking a look at this. I took a quick glance and there are some issues with the approach, and I will use this opportunity to outline my ideas on how to optimize and design this based on long-term thinking for the project. I don't know about you, but from before I made and started heavily using LibreChat (I never use ChatGPT or any other UI anymore), I have conversations in the thousands, especially from December 2022 - February 2023. I can only imagine from users only just discovering the project, so some things will prevent future headache now.
|
Hi @danny-avila |
I understand it's reasonable, but it would be very easy to add another method for each, for example I've simply prompted GPT-4 using LibreChat showing the relevant parts of code, and it does a pretty good job (after some steering): Create_SaveBulkMessages Method.md it includes validation using schemas that already exist. Please also use those existing zod schemas.
This is not so much about the immediate performance, but the number of concurrent database transactions taking place, especially compounded by concurrent use of LibreChat. Existing methods for each item works fine but when things start to grow, with more users, you really want to lean towards bulk operations, easing the load on your database, and helps keep costs down if you're using something like MongoDB Atlas. |
@danny-avila In case we use job scheduler for the task, how do you see the implementation of the feedback to the user whether the import succeeded or not? Are there such examples in the code that I can refer to? |
thats a good question. no example exists as we don't have job scheduling yet. For the sake of rolling this out sooner rather than later, you can tackle this PR with just points 1, and 2-4 from earlier, you can let me worry about the job queue handling as that could be an iterative improvement for this. In my opinion, starting this off with bulk/batch operations is more important and easier to implement with what you built out already. |
@danny-avila I added batch write, rate limiting and job scheduler. I also added an endpoint to poll for a jobs status. That would be easiest way to implement feedback for now. |
I took a quick glance and it looks good to me (aside from pending conflict)! Thanks a lot for working on this. One note about working in the /api workspace. Later in the project, I started making use of JSDocs and typedefs to provide "types" for the CommonJS code. Typing some of the new functions and classes you made will be extremely helpful for me. For example, this case AI will excel in as well. Prompt it with the relevant code, and say "Please write JSDocs for this function/class, to provide intellisense in VS Code" and you can also give it examples of these definitions from the project. |
a469971
to
6f9ab88
Compare
1e83bc0
to
75bec09
Compare
@danny-avila this one is ready for review |
75bec09
to
1b1f625
Compare
Thank you, hope to review it this week |
I have no idea why the test is failing. It passes locally. Is there a chance that |
Did you put the .env file inside api folder? |
yes, if I run tests locally they pass. However on ci they fail |
Seems like some PARAMS is missing at the .env file. can you share the result of npm run test:api from your local computer? |
Sure
|
8fdfb11
to
737d9f2
Compare
Try to run npm jest [path/filename] where file name is the file that failed |
I'll add a MONGO_URI so hopefully the tests can be run in CI env |
I think I know what could be a problem. Let me try to fix it. |
It fails here so adding a dummy MONGO_URI should do the job |
Fixing the package issue, also seeing if I can handle the jest open handles issue, too. |
I actually need to be added as a collaborator to your fork, but the tests are running fine for me locally |
I added you |
fcc30f2
to
0adafd1
Compare
…e case and replace ChatGtp with ChatGpt
my changes so far
and now im seeing that tree structure is not maintained in general. on second thought, I do want to see that in this PR as that is a big value add. going to see if i can easily implement it for now |
… add userId to log message
@danny-avila Thanks for fixing my bugs and merging this 👍 |
Thanks for your hard work and patience on this 🙏 |
…ny-avila#2355) * Basic implementation of ChatGPT conversation import * remove debug code * Handle citations * Fix updatedAt in import * update default model * Use job scheduler to handle import requests * import job status endpoint * Add wrapper around Agenda * Rate limits for import endpoint * rename import api path * Batch save import to mongo * Improve naming * Add documenting comments * Test for importers * Change button for importing conversations * Frontend changes * Import job status endpoint * Import endpoint response * Add translations to new phrases * Fix conversations refreshing * cleanup unused functions * set timeout for import job status polling * Add documentation * get extra spaces back * Improve error message * Fix translation files after merge * fix translation files 2 * Add zh translation for import functionality * Sync mailisearch index after import * chore: add dummy uri for jest tests, as MONGO_URI should only be real for E2E tests * docs: fix links * docs: fix conversationsImport section * fix: user role issue for librechat imports * refactor: import conversations from json - organize imports - add additional jsdocs - use multer with diskStorage to avoid loading file into memory outside of job - use filepath instead of loading data string for imports - replace console logs and some logger.info() with logger.debug - only use multer for import route * fix: undefined metadata edge case and replace ChatGtp -> ChatGpt * Refactor importChatGptConvo function to handle undefined metadata edge case and replace ChatGtp with ChatGpt * fix: chatgpt importer * feat: maintain tree relationship for librechat messages * chore: use enum * refactor: saveMessage to use single object arg, replace console logs, add userId to log message * chore: additional comment * chore: multer edge case * feat: first pass, maintain tree relationship * chore: organize * chore: remove log * ci: add heirarchy test for chatgpt * ci: test maintaining of heirarchy for librechat * wip: allow non-text content type messages * refactor: import content part object json string * refactor: more content types to format * chore: consolidate messageText formatting * docs: update on changes, bump data-provider/config versions, update readme * refactor(indexSync): singleton pattern for MeiliSearchClient * refactor: debug log after batch is done * chore: add back indexSync error handling --------- Co-authored-by: jakubmieszczak <jakub.mieszczak@zendesk.com> Co-authored-by: Danny Avila <danny@librechat.ai>
* Basic implementation of ChatGPT conversation import * remove debug code * Handle citations * Fix updatedAt in import * update default model * Use job scheduler to handle import requests * import job status endpoint * Add wrapper around Agenda * Rate limits for import endpoint * rename import api path * Batch save import to mongo * Improve naming * Add documenting comments * Test for importers * Change button for importing conversations * Frontend changes * Import job status endpoint * Import endpoint response * Add translations to new phrases * Fix conversations refreshing * cleanup unused functions * set timeout for import job status polling * Add documentation * get extra spaces back * Improve error message * Fix translation files after merge * fix translation files 2 * Add zh translation for import functionality * Sync mailisearch index after import * chore: add dummy uri for jest tests, as MONGO_URI should only be real for E2E tests * docs: fix links * docs: fix conversationsImport section * fix: user role issue for librechat imports * refactor: import conversations from json - organize imports - add additional jsdocs - use multer with diskStorage to avoid loading file into memory outside of job - use filepath instead of loading data string for imports - replace console logs and some logger.info() with logger.debug - only use multer for import route * fix: undefined metadata edge case and replace ChatGtp -> ChatGpt * Refactor importChatGptConvo function to handle undefined metadata edge case and replace ChatGtp with ChatGpt * fix: chatgpt importer * feat: maintain tree relationship for librechat messages * chore: use enum * refactor: saveMessage to use single object arg, replace console logs, add userId to log message * chore: additional comment * chore: multer edge case * feat: first pass, maintain tree relationship * chore: organize * chore: remove log * ci: add heirarchy test for chatgpt * ci: test maintaining of heirarchy for librechat * wip: allow non-text content type messages * refactor: import content part object json string * refactor: more content types to format * chore: consolidate messageText formatting * docs: update on changes, bump data-provider/config versions, update readme * refactor(indexSync): singleton pattern for MeiliSearchClient * refactor: debug log after batch is done * chore: add back indexSync error handling --------- Co-authored-by: jakubmieszczak <jakub.mieszczak@zendesk.com> Co-authored-by: Danny Avila <danny@librechat.ai>
…ny-avila#2355) * Basic implementation of ChatGPT conversation import * remove debug code * Handle citations * Fix updatedAt in import * update default model * Use job scheduler to handle import requests * import job status endpoint * Add wrapper around Agenda * Rate limits for import endpoint * rename import api path * Batch save import to mongo * Improve naming * Add documenting comments * Test for importers * Change button for importing conversations * Frontend changes * Import job status endpoint * Import endpoint response * Add translations to new phrases * Fix conversations refreshing * cleanup unused functions * set timeout for import job status polling * Add documentation * get extra spaces back * Improve error message * Fix translation files after merge * fix translation files 2 * Add zh translation for import functionality * Sync mailisearch index after import * chore: add dummy uri for jest tests, as MONGO_URI should only be real for E2E tests * docs: fix links * docs: fix conversationsImport section * fix: user role issue for librechat imports * refactor: import conversations from json - organize imports - add additional jsdocs - use multer with diskStorage to avoid loading file into memory outside of job - use filepath instead of loading data string for imports - replace console logs and some logger.info() with logger.debug - only use multer for import route * fix: undefined metadata edge case and replace ChatGtp -> ChatGpt * Refactor importChatGptConvo function to handle undefined metadata edge case and replace ChatGtp with ChatGpt * fix: chatgpt importer * feat: maintain tree relationship for librechat messages * chore: use enum * refactor: saveMessage to use single object arg, replace console logs, add userId to log message * chore: additional comment * chore: multer edge case * feat: first pass, maintain tree relationship * chore: organize * chore: remove log * ci: add heirarchy test for chatgpt * ci: test maintaining of heirarchy for librechat * wip: allow non-text content type messages * refactor: import content part object json string * refactor: more content types to format * chore: consolidate messageText formatting * docs: update on changes, bump data-provider/config versions, update readme * refactor(indexSync): singleton pattern for MeiliSearchClient * refactor: debug log after batch is done * chore: add back indexSync error handling --------- Co-authored-by: jakubmieszczak <jakub.mieszczak@zendesk.com> Co-authored-by: Danny Avila <danny@librechat.ai>
…ny-avila#2355) * Basic implementation of ChatGPT conversation import * remove debug code * Handle citations * Fix updatedAt in import * update default model * Use job scheduler to handle import requests * import job status endpoint * Add wrapper around Agenda * Rate limits for import endpoint * rename import api path * Batch save import to mongo * Improve naming * Add documenting comments * Test for importers * Change button for importing conversations * Frontend changes * Import job status endpoint * Import endpoint response * Add translations to new phrases * Fix conversations refreshing * cleanup unused functions * set timeout for import job status polling * Add documentation * get extra spaces back * Improve error message * Fix translation files after merge * fix translation files 2 * Add zh translation for import functionality * Sync mailisearch index after import * chore: add dummy uri for jest tests, as MONGO_URI should only be real for E2E tests * docs: fix links * docs: fix conversationsImport section * fix: user role issue for librechat imports * refactor: import conversations from json - organize imports - add additional jsdocs - use multer with diskStorage to avoid loading file into memory outside of job - use filepath instead of loading data string for imports - replace console logs and some logger.info() with logger.debug - only use multer for import route * fix: undefined metadata edge case and replace ChatGtp -> ChatGpt * Refactor importChatGptConvo function to handle undefined metadata edge case and replace ChatGtp with ChatGpt * fix: chatgpt importer * feat: maintain tree relationship for librechat messages * chore: use enum * refactor: saveMessage to use single object arg, replace console logs, add userId to log message * chore: additional comment * chore: multer edge case * feat: first pass, maintain tree relationship * chore: organize * chore: remove log * ci: add heirarchy test for chatgpt * ci: test maintaining of heirarchy for librechat * wip: allow non-text content type messages * refactor: import content part object json string * refactor: more content types to format * chore: consolidate messageText formatting * docs: update on changes, bump data-provider/config versions, update readme * refactor(indexSync): singleton pattern for MeiliSearchClient * refactor: debug log after batch is done * chore: add back indexSync error handling --------- Co-authored-by: jakubmieszczak <jakub.mieszczak@zendesk.com> Co-authored-by: Danny Avila <danny@librechat.ai>
Summary
Issue
Import conversation from Json (ChatGPT, Chatbot-ui and Librechat).
Important design points:
Diagram
UI
Change Type
Please delete any irrelevant options.
Testing
The examples of exported conversations could be found in
/api/server/utils/import/__data__
directory.Apart from unit-tests the functionality were tested on conversation files of ~5Mb size.
Checklist
Please delete any irrelevant options.