Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing RWKV-LLM (#37) #209

Draft
wants to merge 64 commits into
base: main
Choose a base branch
from
Draft

Conversation

AvidEslami
Copy link
Collaborator

@AvidEslami AvidEslami commented Jun 1, 2023

This pull requests implements RWKV's RNN model using Modal. (Issue #37 ) (Reference)

Summary: Created RWKV.py, (will move to openadapt/strategies/mixins and restructure to accept inputs)

Run using the following command (will be changed once implemented as a mixin):

modal token new
modal run .\openadapt\RWKV\RWKV.py

Implementation:

  • Starts a Modal application, gpu must be set to a100 in order to run Raven-14B (largest model)
  • Downloads weights for the desired model from huggingface to the modal server
  • Computes and returns output

TODO:

  • Change structure to function as a mixin (take inputs such as prompts or task descriptions)
  • Allow for modification of parameters via config file (Temperature, Top P, Presence Penalty, etc...)
  • Deploy Modal application permanently to avoid startup times (Upon startup weights must be downloaded, to avoid this we can deploy the Modal app externally and make requests to it instead)

@abrichr
Copy link
Member

abrichr commented Jul 4, 2023

As mentioned in Slack, I think it's time to start fine tuning 🤓

AvidEslami added 24 commits July 7, 2023 13:55
… more epochs), prevent final newline character
…d datasets keeps prompts and outputs seperate
@AvidEslami
Copy link
Collaborator Author

Signals Finetune Update:

  • The following sheet contains the outputs generated by the finetuned models link
  • Any advice on the approach is appreciated, there are still several things to try:
    • Having a random numbers of signals, which could perhaps reduce the models chance of memorizing results
    • Changing prompt structure could improve understanding
    • Creating a more varied dataset (more signals / more tasks) help model catch relations
    • Finetuning Pile-14B for several epochs (will try), Pile-14B is the non finetuned version of Raven-14B

As usual please let me know if you have any suggestions!

@AvidEslami AvidEslami marked this pull request as draft February 4, 2024 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants