diff --git a/README.md b/README.md index 7c0410f..20a5f0c 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,6 @@ Use [FATE-LLM deployment packages](https://github.com/FederatedAI/FATE/wiki/Down ## Quick Start - [Offsite-tuning Tutorial: Model Definition and Job Submission](./doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb) -- [FedIPR Tutorial: Add Watermarks to Your Model](./doc/tutorial/fed_ipr/FedIPR-tutorial.ipynb) -- [Federated ChatGLM-6B Training](./doc/tutorial/parameter_efficient_llm/ChatGLM-6B_ds.ipynb) -- [GPT-2 Training](./doc/tutorial/parameter_efficient_llm/GPT2-example.ipynb) -- [Builtin Models In PELLM](./doc/tutorial/builtin_models.md) \ No newline at end of file +- [Federated ChatGLM3-6B Training](./doc/tutorial/parameter_efficient_llm/ChatGLM3-6B_ds.ipynb) +- [Builtin Models In PELLM](./doc/tutorial/builtin_models.md) +- [Offsite Tuning Tutorial](./doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb) \ No newline at end of file diff --git a/doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb b/doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb index b8b90ea..f522b08 100644 --- a/doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb +++ b/doc/tutorial/offsite_tuning/Offsite_tuning_tutorial.ipynb @@ -13,7 +13,7 @@ "id": "9f1d728c-09e1-418e-8d80-53dd0ec467b1", "metadata": {}, "source": [ - "In this tutorial, we'll focus on how to leverage Offsite-Tuning framework in FATE to fine-tune your LLM. You'll learn how to:\n", + "In this tutorial, we'll focus on how to leverage Offsite-Tuning framework in FATE-LLM-2.0 to fine-tune your LLM. You'll learn how to:\n", "\n", "1. Define models, including main models(which are at server side and will offer adapters and emulators) and submodel(which are at client side and will load adapters and emulators for local fine-tuning) compatible with Offsite-Tuning framework.\n", "2. Get hands-on experience with the Offsite-Tuning trainer.\n", @@ -31,12 +31,7 @@ "\n", "Offsite-Tuning addresses the challenge of unequal distribution of computational power and data. It allows thLLMel owner to enhance the model's capabilities without direct access to private data, while also enabling data owners who may not have the resources to train a full-scale model to fine-tune a portion of it using less computational power. This mutually beneficial arrangement accommodates both parties involve.\n", "\n", - "Beyond the standard two-party setup involving the model owner and the data ownin FATE-LLM, er, Offsite-Tunframework ing is also extendable to scenarios with multiple data owners. FATE supports multi-party Offsite-Tuning, allowing multiple data owners to fine-tune and aggregate their Adapters locally, further enhancing the flexibility and applicability of this framewrFor more details of Offsite-tuning, please refer to the [original paper](https://arxiv.org/pdf/2302.04870.pdf).\n", - "\n", - "\n", - "\n", - "\n", - "\n" + "Beyond the standard two-party setup involving the model owner and the data ownin FATE-LLM, er, Offsite-Tunframework ing is also extendable to scenarios with multiple data owners. FATE supports multi-party Offsite-Tuning, allowing multiple data owners to fine-tune and aggregate their Adapters locally, further enhancing the flexibility and applicability of this framewrFor more details of Offsite-tuning, please refer to the [original paper](https://arxiv.org/pdf/2302.04870.pdf).\n" ] }, { @@ -46,13 +41,14 @@ "source": [ "## Preliminary\n", "\n", - "We strongly recommend you finish reading our NN tutorial to get familiar with Model and Dataset customizations: [NN Tutorials](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/README.md)\n", - "You can add python path so that you can run codes in the notebook." + "We strongly recommend you finish reading our NN tutorial to get familiar with Model and Dataset customizations: [NN Tutorials](https://github.com/FederatedAI/FATE/blob/master/doc/2.0/fate/components/pipeline_nn_cutomization_tutorial.md)\n", + "\n", + "In this tutorial, we assume that you have deploy the codes of FATE(including fateflow & fate-client) & FATE-LLM-2.0. You can add python path so that you can run codes in the notebook." ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 4, "id": "f33516e8-0d28-4c97-bc38-ba28d60acf37", "metadata": {}, "outputs": [], @@ -62,6 +58,14 @@ "sys.path.append(your_path_to_fate_python)" ] }, + { + "cell_type": "markdown", + "id": "2f2fc794", + "metadata": {}, + "source": [ + "If you install FATE & FATE-LLM-2.0 via pip, you can directly use the following codes." + ] + }, { "cell_type": "markdown", "id": "7309281b-5956-4158-9256-d6db230e086d", @@ -186,11 +190,7 @@ "source": [ "### Share additional parameters with clients\n", "\n", - "Additionally, beyond the weights of emulators and adapters, you may also want to share other model parameters, such as embedding weights, with your client partners. To achieve this, you'll need to implement two more interfaces: get_additional_param_state_dict and load_additional_param_state_dict for both the Main and Sub Models.\n", - "\n", - "### Special Attention for Large Objects\n", - "\n", - "Please note that special attention is required when you need to share large objects, any object potentially exceeding 2GB, such as embedding weights. You should slice these large objects to manage them more efficiently. Below is a code snippet demonstrating this practice, taken directly from FATE's native GPT-2 implementation:" + "Additionally, beyond the weights of emulators and adapters, you may also want to share other model parameters, such as embedding weights, with your client partners. To achieve this, you'll need to implement two more interfaces: get_additional_param_state_dict and load_additional_param_state_dict for both the Main and Sub Models." ] }, { @@ -263,7 +263,7 @@ "\n", "### Prepare QA Dataset - Sciq\n", "\n", - "In this example, we use sciq dataset. You can use tools provided in our qa_dataset.py to tokenize the sciq dataset and save the tokenized result. " + "In this example, we use sciq dataset. You can use tools provided in our qa_dataset.py to tokenize the sciq dataset and save the tokenized result. **Remember to modify the save_path to your own path.** For the sake of simplicity, in this tutorial, for every party we only use this dataset to train the model." ] }, { @@ -276,7 +276,7 @@ "from fate_llm.dataset.qa_dataset import tokenize_qa_dataset\n", "from transformers import AutoTokenizer\n", "tokenizer_name_or_path = 'gpt2'\n", - "tokenizer = AutoTokenizer.from_pretrained(gpt2_path)\n", + "tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path)\n", "\n", "if 'llama' in tokenizer_name_or_path:\n", " tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path, unk_token=\"\", bos_token=\"\", eos_token=\"\", add_eos_token=True) \n", @@ -288,8 +288,8 @@ "\n", "import os\n", "# bind data path to name & namespace\n", - "fate_project_path = os.path.abspath('../../../')\n", - "rs = tokenize_qa_dataset('sciq', tokenizer, fate_project_path + '/sciq/', seq_max_len=600) # we save the cache dataset to the fate root folder" + "save_path = 'xxxx/sciq'\n", + "rs = tokenize_qa_dataset('sciq', tokenizer, save_path, seq_max_len=600) # we save the cache dataset to the fate root folder" ] }, { @@ -310,7 +310,7 @@ "from fate_llm.dataset.qa_dataset import QaDataset\n", "\n", "ds = QaDataset(tokenizer_name_or_path=tokenizer_name_or_path)\n", - "ds.load(fate_project_path + '/sciq/')" + "ds.load(save_path)" ] }, { @@ -340,86 +340,406 @@ "source": [ "## Submit a Task\n", "\n", - "Now the model and the dataset is prepared! We can submit a training task. \n", - "After we submit the task below, the following process will occur: The server and client each initialize their respective models. The server extracts shared parameters and sends them to the client. The client then loads these parameters and conducts training on a miniaturized GPT-2 model composed of an emulator and adaptesr onSciqP We speicify the OffsiteTuningTrainer via TrainerParam. If you are not familiar with trainer configuration, please refer to [FATE-NN Tutorial](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/README.md).\n", + "Now the model and the dataset is prepared! We can submit a training task. In the FATE-2.0, you can define your pipeline in a much easier manner.\n", + "\n", + "After we submit the task below, the following process will occur: The server and client each initialize their respective models. The server extracts shared parameters and sends them to the client. The client then loads these parameters and conducts training on a miniaturized GPT-2 model composed of an emulator and adapter on SciqP \n", + "\n", + "If you are not familiar with trainer configuration, please refer to [NN Tutorials](https://github.com/FederatedAI/FATE/blob/master/doc/2.0/fate/components/pipeline_nn_cutomization_tutorial.md).\n", + "\n", " Upon completion of the training, the client sends the adapter parameters back to the server. Since we are directly using Hugging Face's LMHeadGPT2, there's no need to supply a loss function. Simply inputting the preprocessed data and labels into the model will calculate the correct loss and proceed with gradient descent\n", "\n", - "One thing to pay special attention to is that Offsite-Tuning differs from FedAvg within FATE. In Offsite-Tuning, the server (the arbiter role) needs to initialize the model. Therefore, please refer to the example below and set the 'nn_component' parameters separately for the client and the server. Also, don't forget to add the 'server_init=True' parameter to the server; otherwise, the arbiter side will not initialize the model.\n", + "One thing to pay special attention to is that Offsite-Tuning differs from FedAvg within FATE. In Offsite-Tuning, the server (the arbiter role) needs to initialize the model. Therefore, please refer to the example below and set the runner conf separately for the client and the server.\n", + "\n", + "To make this a quick demo, we only select 100 samples from the origin qa datset, see 'select_num=100' in the LLMDatasetLoader." + ] + }, + { + "cell_type": "markdown", + "id": "261dfb43", + "metadata": {}, + "source": [ + "### Bind Dataset Path with Name & Namespace\n", "\n", - "To make this a quick demo, we only select 100 samples from the origin qa datset, see 'select_num=100' in the DatasetParam." + "Plase execute the following code to bind the dataset path with name & namespace. Remember to modify the path to your own dataset save path." ] }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, + "id": "8dc1e82b", + "metadata": {}, + "outputs": [], + "source": [ + "! flow table bind --namespace experiment --name sciq --path YOUR_SAVE_PATH" + ] + }, + { + "cell_type": "markdown", + "id": "0e8c5ff4", + "metadata": {}, + "source": [ + "### Pipeline codes" + ] + }, + { + "cell_type": "code", + "execution_count": 16, "id": "c9113d10-c3e7-4875-9502-ce46aa0b86b1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "" + "" ] }, - "execution_count": 1, + "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import torch as t\n", - "from torch import nn\n", - "from pipeline import fate_torch_hook\n", - "from pipeline.component import HomoNN\n", - "from pipeline.backend.pipeline import PipeLine\n", - "from pipeline.component import Reader, Evaluation, DataTransform\n", - "from pipeline.interface import Data, Model\n", - "\n", - "t = fate_torch_hook(t)\n", - "\n", - "import os\n", - "# bind data path to name & namespace\n", - "fate_project_path = os.path.abspath('../../../')\n", - "guest = 9997\n", - "arbiter = 9997\n", - "pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter)\n", - "\n", - "# bind data path with name & namespace\n", - "data_0 = {\"name\": \"sciq\", \"namespace\": \"experiment\"}\n", - "data_path_0 = fate_project_path + '/sciq/'\n", - "pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)\n", - "\n", - "reader_0 = Reader(name=\"reader_0\")\n", - "reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)\n", - "\n", - "gpt2_type = 'gpt2'\n", - "\n", - "from pipeline.component.nn import DatasetParam\n", - "dataset_param = DatasetParam(dataset_name='qa_dataset', tokenizer_name_or_path=gpt2_type, select_num=100)\n", - "\n", - "from pipeline.component.homo_nn import TrainerParam # Interface\n", - "sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=gpt2_type \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)\n", - "main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=gpt2_type \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)\n", - "\n", - "nn_component = HomoNN(name='nn_0')\n", - "\n", - "nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param, # dataset\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=3, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \\\n", - " save_to_local_dir=True, cuda=0),\n", - " optimizer=t.optim.Adam(lr=5e-5)\n", - " )\n", - "nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server, \n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', collate_fn='DataCollatorForTokenClassification', save_to_local_dir=True),\n", - " # Attention here\n", - " server_init=True # This parameter must be set True !!!!!!!!!!!\n", - " )\n", - "pipeline.add_component(reader_0)\n", - "pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))\n", + "import time\n", + "from fate_client.pipeline.components.fate.reader import Reader\n", + "from fate_client.pipeline import FateFlowPipeline\n", + "from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_conf_of_ot_runner\n", + "from fate_client.pipeline.components.fate.nn.algo_params import Seq2SeqTrainingArguments, FedAVGArguments\n", + "from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader\n", + "from fate_client.pipeline.components.fate.nn.torch.base import Sequential\n", + "from fate_client.pipeline.components.fate.nn.torch import nn\n", + "\n", + "\n", + "guest = '10000'\n", + "host = '10000'\n", + "arbiter = '10000'\n", + "\n", + "pipeline = FateFlowPipeline().set_parties(guest=guest, arbiter=arbiter)\n", + "\n", + "reader_0 = Reader(\"reader_0\", runtime_parties=dict(guest=guest))\n", + "reader_0.guest.task_parameters(\n", + " namespace=\"experiment\",\n", + " name=\"sciq\"\n", + ")\n", + "\n", + "client_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadSubModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=4,\n", + " adapter_top_layer_num=1,\n", + " adapter_bottom_layer_num=1\n", + ")\n", + "\n", + "server_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadMainModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=4,\n", + " adapter_top_layer_num=1,\n", + " adapter_bottom_layer_num=1 \n", + ")\n", + "\n", + "train_args = Seq2SeqTrainingArguments(\n", + " per_device_train_batch_size=1,\n", + " learning_rate=5e-5,\n", + " disable_tqdm=False,\n", + " num_train_epochs=1,\n", + " logging_steps=10,\n", + " logging_strategy='steps',\n", + " use_cpu=False\n", + ")\n", + "\n", + "dataset = LLMDatasetLoader(\n", + " module_name='qa_dataset', item_name='QaDataset',\n", + " tokenizer_name_or_path='gpt2',\n", + " select_num=100\n", + ")\n", + "\n", + "data_collator = LLMDataFuncLoader(module_name='data_collator.cust_data_collator', item_name='get_seq2seq_data_collator', tokenizer_name_or_path='gpt2')\n", + "\n", + "client_conf = get_conf_of_ot_runner(\n", + " model=client_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=False\n", + ")\n", + "\n", + "server_conf = get_conf_of_ot_runner(\n", + " model=server_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=False\n", + ")\n", + "\n", + "homo_nn_0 = HomoNN(\n", + " 'nn_0',\n", + " train_data=reader_0.outputs[\"output_data\"],\n", + " runner_module=\"offsite_tuning_runner\",\n", + " runner_class=\"OTRunner\"\n", + ")\n", + "\n", + "homo_nn_0.guest.task_parameters(runner_conf=client_conf)\n", + "homo_nn_0.arbiter.task_parameters(runner_conf=server_conf)\n", + "pipeline.add_tasks([reader_0, homo_nn_0])\n", "pipeline.compile()" ] }, + { + "cell_type": "markdown", + "id": "e97c2823", + "metadata": {}, + "source": [ + "You can try to initialize your models, datasets to check if they can be loaded correctly." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "872817e5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "GPT2LMHeadSubModel(\n", + " (model): GPT2LMHeadModel(\n", + " (transformer): GPT2Model(\n", + " (wte): Embedding(50257, 768)\n", + " (wpe): Embedding(1024, 768)\n", + " (drop): Dropout(p=0.1, inplace=False)\n", + " (h): ModuleList(\n", + " (0): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (1): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (2): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (3): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (4): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (5): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " )\n", + " (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " )\n", + " (lm_head): Linear(in_features=768, out_features=50257, bias=False)\n", + " )\n", + " (emulator): ModuleList(\n", + " (0): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (1): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (2): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " (3): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " )\n", + " (adapter_bottom): ModuleList(\n", + " (0): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " )\n", + " (adapter_top): ModuleList(\n", + " (0): GPT2Block(\n", + " (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (attn): GPT2Attention(\n", + " (c_attn): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (attn_dropout): Dropout(p=0.1, inplace=False)\n", + " (resid_dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)\n", + " (mlp): GPT2MLP(\n", + " (c_fc): Conv1D()\n", + " (c_proj): Conv1D()\n", + " (act): NewGELUActivation()\n", + " (dropout): Dropout(p=0.1, inplace=False)\n", + " )\n", + " )\n", + " )\n", + ")\n", + "**********\n", + "\n", + "**********\n", + "DataCollatorForSeq2Seq(tokenizer=GPT2TokenizerFast(name_or_path='gpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True), added_tokens_decoder={\n", + "\t50256: AddedToken(\"<|endoftext|>\", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),\n", + "}, model=None, padding=True, max_length=None, pad_to_multiple_of=None, label_pad_token_id=-100, return_tensors='pt')\n" + ] + } + ], + "source": [ + "print(client_model())\n", + "print('*' * 10)\n", + "print(dataset())\n", + "print('*' * 10)\n", + "print(data_collator())" + ] + }, + { + "cell_type": "markdown", + "id": "898c3491", + "metadata": {}, + "source": [ + "Seems that everything is ready! Now we can submit the task. Submit the code below to submit your task." + ] + }, { "cell_type": "code", "execution_count": 2, @@ -437,7 +757,7 @@ "source": [ "## Add Deepspeed Setting\n", "\n", - "By simply adding a ds_config, we can run our task with a deepspeed backend:" + "By simply adding a ds_config, we can run our task with a deepspeed backend. If you have deployed eggroll envoironment, you can submmit the task with deepspeed to eggroll accelerate your training." ] }, { @@ -458,98 +778,136 @@ } ], "source": [ - "import torch as t\n", - "from torch import nn\n", - "from pipeline import fate_torch_hook\n", - "from pipeline.component import HomoNN\n", - "from pipeline.backend.pipeline import PipeLine\n", - "from pipeline.component import Reader, Evaluation, DataTransform\n", - "from pipeline.interface import Data, Model\n", - "\n", - "t = fate_torch_hook(t)\n", - "\n", - "import os\n", - "# bind data path to name & namespace\n", - "fate_project_path = os.path.abspath('../../../')\n", - "guest = 9997\n", - "arbiter = 9997\n", - "pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter)\n", - "\n", - "# bind data path with name & namespace\n", - "data_0 = {\"name\": \"sciq\", \"namespace\": \"experiment\"}\n", - "data_path_0 = fate_project_path + '/sciq/'\n", - "pipeline.bind_table(name=data_0['name'], namespace=data_0['namespace'], path=data_path_0)\n", - "\n", - "reader_0 = Reader(name=\"reader_0\")\n", - "reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_0)\n", - "\n", - "# deepspeed config\n", + "import time\n", + "from fate_client.pipeline.components.fate.reader import Reader\n", + "from fate_client.pipeline import FateFlowPipeline\n", + "from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_conf_of_ot_runner\n", + "from fate_client.pipeline.components.fate.nn.algo_params import Seq2SeqTrainingArguments, FedAVGArguments\n", + "from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader\n", + "from peft import LoraConfig, TaskType\n", + "from transformers.modeling_utils import unwrap_model\n", + "\n", + "\n", + "guest = '10000'\n", + "host = '10000'\n", + "arbiter = '10000'\n", + "\n", + "# pipeline = FateFlowPipeline().set_parties(guest=guest, host=host, arbiter=arbiter)\n", + "pipeline = FateFlowPipeline().set_parties(guest=guest, arbiter=arbiter)\n", + "\n", + "reader_0 = Reader(\"reader_0\", runtime_parties=dict(guest=guest))\n", + "reader_0.guest.task_parameters(\n", + " namespace=\"experiment\",\n", + " name=\"sciq\"\n", + ")\n", + "\n", + "client_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadSubModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=18,\n", + " adapter_top_layer_num=2,\n", + " adapter_bottom_layer_num=2\n", + ")\n", + "\n", + "server_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadMainModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=18,\n", + " adapter_top_layer_num=2,\n", + " adapter_bottom_layer_num=2 \n", + ")\n", + "\n", + "dataset = LLMDatasetLoader(\n", + " module_name='qa_dataset', item_name='QaDataset',\n", + " tokenizer_name_or_path='gpt2',\n", + " select_num=100\n", + ")\n", + "\n", + "data_collator = LLMDataFuncLoader(module_name='data_collator.cust_data_collator', item_name='get_seq2seq_data_collator', tokenizer_name_or_path='gpt2')\n", + "\n", + "batch_size = 1\n", + "lr = 5e-5\n", "ds_config = {\n", - " \"train_micro_batch_size_per_gpu\": 2,\n", - " \"gradient_accumulation_steps\": 2,\n", + " \"train_micro_batch_size_per_gpu\": batch_size,\n", " \"optimizer\": {\n", - " \"type\": \"AdamW\",\n", + " \"type\": \"Adam\",\n", " \"params\": {\n", - " \"lr\": 5e-5\n", + " \"lr\": lr,\n", + " \"torch_adam\": True,\n", + " \"adam_w_mode\": False\n", " }\n", - " }\n", - " ,\n", + " },\n", " \"fp16\": {\n", - " \"enabled\": False\n", - " }\n", - " ,\n", + " \"enabled\": True\n", + " },\n", + " \"gradient_accumulation_steps\": 1,\n", " \"zero_optimization\": {\n", - " \"stage\": 1,\n", + " \"stage\": 2,\n", + " \"allgather_partitions\": True,\n", + " \"allgather_bucket_size\": 1e8,\n", + " \"overlap_comm\": True,\n", + " \"reduce_scatter\": True,\n", + " \"reduce_bucket_size\": 1e8,\n", + " \"contiguous_gradients\": True,\n", " \"offload_optimizer\": {\n", " \"device\": \"cpu\"\n", " },\n", - " \"contiguous_gradients\": True,\n", - " \"overlap_comm\": True\n", + " \"offload_param\": {\n", + " \"device\": \"cpu\"\n", + " }\n", " }\n", "}\n", "\n", - "gpt2_type = 'gpt2'\n", - "\n", - "from pipeline.component.nn import DatasetParam\n", - "dataset_param = DatasetParam(dataset_name='qa_dataset', tokenizer_name_or_path=gpt2_type, select_num=100)\n", - "\n", - "from pipeline.component.homo_nn import TrainerParam # Interface\n", - "sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=gpt2_type \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)\n", - "main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=gpt2_type \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)\n", - "\n", - "nn_component = HomoNN(name='nn_0')\n", - "\n", - "nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param, # dataset\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=3, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \\\n", - " save_to_local_dir=True),\n", - " optimizer=t.optim.Adam(lr=5e-5)\n", - " )\n", - "nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server, \n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', collate_fn='DataCollatorForTokenClassification', save_to_local_dir=True),\n", - " # Attention here\n", - " server_init=True # This parameter must be set True !!!!!!!!!!!\n", - " )\n", - "pipeline.add_component(reader_0)\n", - "pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))\n", - "pipeline.compile()" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "23320cb9-d06a-44ac-8966-398b0f7bbaae", - "metadata": {}, - "outputs": [], - "source": [ - "from pipeline.runtime.entity import JobParameters\n", - "pipeline.fit(JobParameters(task_conf={\n", - " \"nn_0\": {\n", - " \"launcher\": \"deepspeed\",\n", - " \"world_size\": 4\n", - " }\n", - "}))" + "train_args = Seq2SeqTrainingArguments(\n", + " per_device_train_batch_size=1,\n", + " learning_rate=5e-5,\n", + " disable_tqdm=False,\n", + " num_train_epochs=1,\n", + " logging_steps=10,\n", + " logging_strategy='steps',\n", + " dataloader_num_workers=4,\n", + " use_cpu=False,\n", + " deepspeed=ds_config, # Add deepspeed config here\n", + " remove_unused_columns=False,\n", + " fp16=True\n", + ")\n", + "\n", + "client_conf = get_conf_of_ot_runner(\n", + " model=client_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=False,\n", + ")\n", + "\n", + "server_conf = get_conf_of_ot_runner(\n", + " model=server_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=False\n", + ")\n", + "\n", + "\n", + "homo_nn_0 = HomoNN(\n", + " 'nn_0',\n", + " train_data=reader_0.outputs[\"output_data\"],\n", + " runner_module=\"offsite_tuning_runner\",\n", + " runner_class=\"OTRunner\"\n", + ")\n", + "\n", + "homo_nn_0.guest.task_parameters(runner_conf=client_conf)\n", + "homo_nn_0.arbiter.task_parameters(runner_conf=server_conf)\n", + "\n", + "# if you have deployed eggroll, you can add this line to submit your job to eggroll\n", + "homo_nn_0.guest.conf.set(\"launcher_name\", \"deepspeed\")\n", + "\n", + "pipeline.add_tasks([reader_0, homo_nn_0])\n", + "pipeline.conf.set(\"task\", dict(engine_run={\"cores\": 4}))\n", + "pipeline.compile()\n", + "pipeline.fit()\n" ] }, { @@ -560,11 +918,7 @@ "## Offsite-tuning + Multi Client Federation\n", "\n", "\n", - "The Offsite-Tuning + FedAVG federation is configured based on the standard Offsite-Tuning. The setup is a bit more complex, but we will walk you through it step by step. The pipeline code below contains detailed comments. When reading, please pay attention to the following points:\n", - "\n", - "1. In a multi-party scenario, please fill in different party_ids based on your deployment.\n", - "2. The operation to bind the data path with the name & namespace needs to be run on the machines of all parties. For convenience, we've placed the code in one location.\n", - "3. When configuring Trainer parameters, make sure to add the 'need_aggregate=True' parameter to the OffsiteTuningTrainer for each client and server. So adapters will be aggregated during training." + "The Offsite-Tuning + FedAVG federation is configured based on the standard Offsite-Tuning. In this situation, you need to add data input & configurations for all clients. And do remember to add 'aggregate_model=True' for client & server conf so that model federation will be conducted during the training." ] }, { @@ -574,195 +928,98 @@ "metadata": {}, "outputs": [], "source": [ - "import torch as t\n", - "from torch import nn\n", - "from pipeline import fate_torch_hook\n", - "from pipeline.component import HomoNN\n", - "from pipeline.backend.pipeline import PipeLine\n", - "from pipeline.component import Reader, Evaluation, DataTransform\n", - "from pipeline.interface import Data, Model\n", - "\n", - "t = fate_torch_hook(t)\n", - "\n", - "import os\n", - "# bind data path to name & namespace\n", - "fate_project_path = os.path.abspath('../../../')\n", - "guest = 9997\n", - "hosts = [9999, 10000]\n", - "arbiter = 9997\n", - "pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, arbiter=arbiter, host=hosts)\n", - "\n", - "data_9997 = {\"name\": \"sciq-9997-gpt2\", \"namespace\": \"experiment\"}\n", - "data_9999 = {\"name\": \"sciq-9999-gpt2\", \"namespace\": \"experiment\"}\n", - "data_10000 = {\"name\": \"sciq-10000-gpt2\", \"namespace\": \"experiment\"}\n", - "\n", - "# run the binding codes on 9997\n", - "data_path_9997 = fate_project_path + '/sciq/'\n", - "pipeline.bind_table(name=data_9997['name'], namespace=data_9997['namespace'], path=data_path_9997)\n", - "\n", - "# run the binding codes on 9998\n", - "data_path_9999 = fate_project_path + '/sciq/'\n", - "pipeline.bind_table(name=data_9999['name'], namespace=data_9999['namespace'], path=data_path_9999)\n", - "\n", - "# run the binding codes on 10000\n", - "data_path_10000 = fate_project_path + '/sciq/'\n", - "pipeline.bind_table(name=data_10000['name'], namespace=data_10000['namespace'], path=data_path_10000)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "253499d2-37a1-4fbe-9427-646d51fd6edd", - "metadata": {}, - "outputs": [], - "source": [ - "# deepspeed config\n", - "ds_config = {\n", - " \"train_micro_batch_size_per_gpu\": 2,\n", - " \"gradient_accumulation_steps\": 2,\n", - " \"optimizer\": {\n", - " \"type\": \"AdamW\",\n", - " \"params\": {\n", - " \"lr\": 5e-5\n", - " }\n", - " }\n", - " ,\n", - " \"fp16\": {\n", - " \"enabled\": False\n", - " }\n", - " ,\n", - " \"zero_optimization\": {\n", - " \"stage\": 1,\n", - " \"offload_optimizer\": {\n", - " \"device\": \"cpu\"\n", - " },\n", - " \"contiguous_gradients\": True,\n", - " \"overlap_comm\": True\n", - " }\n", - "}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "909dc4fb-8d1e-4831-a6f7-744cf7d826c1", - "metadata": {}, - "outputs": [], - "source": [ - "model_path = 'gpt2'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "2283025d-9acf-4ffa-8a25-648aa619528e", - "metadata": {}, - "outputs": [], - "source": [ - "reader_0 = Reader(name=\"reader_0\")\n", - "reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=data_9997)\n", - "reader_0.get_party_instance(role='host', party_id=hosts[0]).component_param(table=data_9999)\n", - "reader_0.get_party_instance(role='host', party_id=hosts[1]).component_param(table=data_10000)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "5ce1cc8a-1003-4379-aa4f-bf3fa28237c8", - "metadata": {}, - "outputs": [], - "source": [ - "from pipeline.component.nn import DatasetParam\n", - "\n", - "# This demo utilizes the same dataset but selects distinct segments to mimic an equal data distribution across different parties. \n", - "# We adopt this strategy for the sake of convenience.\n", - "dataset_param_0 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=0, select_num=3893)\n", - "dataset_param_1 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=3893, select_num=3893)\n", - "dataset_param_2 = DatasetParam(dataset_name='qa_ds', tokenizer_name_or_path=model_path, start_idx=7786, select_num=3893)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "50ea1168-417c-41da-b7da-b2625c26af50", - "metadata": {}, - "outputs": [], - "source": [ - "from pipeline.component.homo_nn import TrainerParam # Interface\n", - "\n", - "# define model structure\n", - "sub_model_client = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadSubModel', model_name_or_path=model_path \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)\n", - "main_model_server = t.nn.CustModel(module_name='offsite_tuning.gpt2_ot', class_name='GPT2LMHeadMainModel', model_name_or_path=model_path \\\n", - " ,emulator_layer_num=4, adapter_top_layer_num=2, adapter_bottom_layer_num=2)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "dffcace2-0d59-411e-856f-512e7eafd793", - "metadata": {}, - "outputs": [], - "source": [ - "nn_component = HomoNN(name='nn_0')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3c854117-3fe1-4a7b-9505-bb131d95f178", - "metadata": {}, - "outputs": [], - "source": [ - "epochs = 8\n", - "# We have 4 party to set\n", - "# Please make sure that need_aggregate is True, and epochs parameter of all parties are the same\n", - "nn_component.get_party_instance(role='guest', party_id=guest).component_param(model=sub_model_client, dataset=dataset_param_0, # dataset\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \\\n", - " save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)\n", - "\n", - "nn_component.get_party_instance(role='host', party_id=hosts[0]).component_param(model=sub_model_client, dataset=dataset_param_1, # dataset\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \\\n", - " save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)\n", - "\n", - "nn_component.get_party_instance(role='host', party_id=hosts[1]).component_param(model=sub_model_client, dataset=dataset_param_2, # dataset\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, batch_size=4, collate_fn='DataCollatorForTokenClassification', task_type='causal_ml', \\\n", - " save_to_local_dir=True, need_aggregate=True), ds_config=ds_config)\n", - "\n", - "\n", - "nn_component.get_party_instance(role='arbiter', party_id=arbiter).component_param(model=main_model_server,\n", - " trainer=TrainerParam(trainer_name='offsite_tuning_trainer', epochs=epochs, save_to_local_dir=True,\n", - " need_aggregate=True),\n", - " server_init=True\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "a5d173c1-5d72-4d25-9b78-91e6ef766d8c", - "metadata": {}, - "outputs": [], - "source": [ - "pipeline.add_component(reader_0)\n", - "pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))\n", - "pipeline.compile()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f6674178-2c59-43d6-b6ce-888e426f27b3", - "metadata": {}, - "outputs": [], - "source": [ - "from pipeline.runtime.entity import JobParameters\n", - "pipeline.fit(JobParameters(task_conf={\n", - " \"nn_0\": {\n", - " \"launcher\": \"deepspeed\",\n", - " \"world_size\": 4\n", - " }\n", - "}))" + "import time\n", + "from fate_client.pipeline.components.fate.reader import Reader\n", + "from fate_client.pipeline import FateFlowPipeline\n", + "from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_conf_of_ot_runner\n", + "from fate_client.pipeline.components.fate.nn.algo_params import Seq2SeqTrainingArguments, FedAVGArguments\n", + "from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader, LLMDatasetLoader, LLMCustFuncLoader\n", + "from peft import LoraConfig, TaskType\n", + "\n", + "\n", + "guest = '10000'\n", + "host = '10000'\n", + "arbiter = '10000'\n", + "\n", + "pipeline = FateFlowPipeline().set_parties(guest=guest, host=host, arbiter=arbiter)\n", + "\n", + "reader_0 = Reader(\"reader_0\", runtime_parties=dict(guest=guest, host=host))\n", + "reader_0.guest.task_parameters(\n", + " namespace=\"experiment\",\n", + " name=\"sciq\"\n", + ")\n", + "reader_0.hosts[0].task_parameters(\n", + " namespace=\"experiment\",\n", + " name=\"sciq\"\n", + ")\n", + "\n", + "client_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadSubModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=4,\n", + " adapter_top_layer_num=1,\n", + " adapter_bottom_layer_num=1\n", + ")\n", + "\n", + "server_model = LLMModelLoader(\n", + " module_name='offsite_tuning.gpt2', item_name='GPT2LMHeadMainModel',\n", + " model_name_or_path='gpt2',\n", + " emulator_layer_num=4,\n", + " adapter_top_layer_num=1,\n", + " adapter_bottom_layer_num=1 \n", + ")\n", + "\n", + "dataset = LLMDatasetLoader(\n", + " module_name='qa_dataset', item_name='QaDataset',\n", + " tokenizer_name_or_path='gpt2',\n", + " select_num=100\n", + ")\n", + "\n", + "data_collator = LLMCustFuncLoader(module_name='cust_data_collator', item_name='get_seq2seq_tokenizer', model_path='gpt2')\n", + "\n", + "train_args = Seq2SeqTrainingArguments(\n", + " per_device_train_batch_size=1,\n", + " learning_rate=5e-5,\n", + " disable_tqdm=False,\n", + " num_train_epochs=1,\n", + " logging_steps=10,\n", + " logging_strategy='steps',\n", + " dataloader_num_workers=4\n", + ")\n", + "\n", + "client_conf = get_conf_of_ot_runner(\n", + " model=client_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=True\n", + ")\n", + "\n", + "server_conf = get_conf_of_ot_runner(\n", + " model=server_model,\n", + " dataset=dataset,\n", + " data_collator=data_collator,\n", + " training_args=train_args,\n", + " fed_args=FedAVGArguments(),\n", + " aggregate_model=True\n", + ")\n", + "\n", + "homo_nn_0 = HomoNN(\n", + " 'nn_0',\n", + " train_data=reader_0.outputs[\"output_data\"],\n", + " runner_module=\"offsite_tuning_runner\",\n", + " runner_class=\"OTRunner\"\n", + ")\n", + "\n", + "homo_nn_0.guest.task_parameters(runner_conf=client_conf)\n", + "homo_nn_0.hosts[0].task_parameters(runner_conf=client_conf)\n", + "homo_nn_0.arbiter.task_parameters(runner_conf=server_conf)\n", + "\n", + "pipeline.add_tasks([reader_0, homo_nn_0])\n", + "\n", + "pipeline.compile()\n", + "pipeline.fit()" ] } ], diff --git a/doc/tutorial/parameter_efficient_llm/ChatGLM-6B_ds.ipynb b/doc/tutorial/parameter_efficient_llm/ChatGLM-6B_ds.ipynb deleted file mode 100644 index f3a43c1..0000000 --- a/doc/tutorial/parameter_efficient_llm/ChatGLM-6B_ds.ipynb +++ /dev/null @@ -1,463 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Federated ChatGLM Tuning with Parameter Efficient methods in FATE-LLM" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial, we will demonstrate how to efficiently train federated ChatGLM-6B with deepspeed using the FATE-LLM framework. In FATE-LLM, we introduce the \"pellm\"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on ChatGLM-^b, and we will also emphasize the use of the Adapter mechanism for fine-tuning ChatGLM-6B, which enables us to effectively reduce communication volume and improve overall efficiency.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## FATE-LLM: ChatGLM-6B\n", - "\n", - "### ChatGLM-6B\n", - "ChatGLM-6B is a large transformer-based language model with 6.2 billion parameters, trained on about 1T tokens of Chinese and English corpus. ChatGLM-6B is an open bilingual language model based on General Language Model. You can download the pretrained model from [here](https://huggingface.co/THUDM/chatglm-6b), or let the program automatically download it when you use it later.\n", - "\n", - "### Current Features\n", - "\n", - "In current version, FATE-LLM: ChatGLM-6B supports the following features:\n", - "
\n", - " \n", - "
" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Experiment Setting" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Before running experiment, please make sure that [FATE-LLM Cluster](https://github.com/FederatedAI/FATE/wiki/Download#llm%E9%83%A8%E7%BD%B2%E5%8C%85) has been deployed. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Dataset: Advertising Text Generation\n", - "\n", - "This is an advertising test generateion dataset, you can download dataset from the following links and place it in the examples/data folder. \n", - "- [data link 1](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view)\n", - "- [data link 2](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1) \n", - "\n", - "You can refer to following link for more details about [data](https://aclanthology.org/D19-1321.pdf)" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "df = pd.read_json('${fate_install}/examples/data/AdvertiseGen/train.json', lines=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### ChatGLM-6B with Adapter\n", - "\n", - "In this section, we will guide you through the process of finetuning ChatGLM-6B with adapters using the FATE-LLM framework. Before starting this section, we recommend that you read through this tutorial first: [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "ChatGLM model is located on fate_llm/model_zoo/chatglm.py, can be use directly" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "albert.py bert.py deberta.py gpt2.py\t\t\t __pycache__\r\n", - "bart.py chatglm.py distilbert.py parameter_efficient_llm.py roberta.py\r\n" - ] - } - ], - "source": [ - "! ls ../../../fate/python/fate_llm/model_zoo/pellm" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Adapters" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can directly use adapters from the peft. See details for adapters on this page [Adapter Methods](https://huggingface.co/docs/peft/index) for more details. By specifying the adapter name and the adapter\n", - "config dict we can insert adapters into our language models:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "from peft import LoraConfig, TaskType\n", - "\n", - "# define lora config\n", - "lora_config = LoraConfig(\n", - " task_type=TaskType.SEQ_CLS,\n", - " inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,\n", - " target_modules=['c_attn'],\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Init ChatGLM Model " - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "import torch as t\n", - "from pipeline import fate_torch_hook\n", - "from pipeline.component.nn import save_to_fate_llm\n", - "fate_torch_hook(t)\n", - "\n", - "model_path = \"your download chatglm path\"\n", - "model = t.nn.Sequential(\n", - " t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',\n", - " peft_config=lora_config.to_dict(), peft_type='LoraConfig',\n", - " pretrained_path=model_path)\n", - ")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**During the training process, all weights of the pretrained language model will be frozen, and weights of adapters are traininable. Thus, FATE-LLM only train in the local training and aggregate adapters' weights in the fedederation process**\n", - "\n", - "Now available adapters are [Adapters Overview](https://huggingface.co/docs/peft/index) for details.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Inint DeepSpeed Config" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "ds_config = {\n", - " \"train_micro_batch_size_per_gpu\": 1,\n", - " \"optimizer\": {\n", - " \"type\": \"Adam\",\n", - " \"params\": {\n", - " \"lr\": 5e-4\n", - " }\n", - " },\n", - " \"fp16\": {\n", - " \"enabled\": True\n", - " },\n", - " \"zero_optimization\": {\n", - " \"stage\": 2,\n", - " \"allgather_partitions\": True,\n", - " \"allgather_bucket_size\": 5e8,\n", - " \"overlap_comm\": False,\n", - " \"reduce_scatter\": True,\n", - " \"reduce_bucket_size\": 5e8,\n", - " \"contiguous_gradients\": True\n", - " }\n", - "}\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit Federated Task\n", - "To run federated task, please make sure to ues fate>=v1.11.2 and deploy it with gpu machines. To running this code, make sure training data path is already binded. The following code shoud be copy to a script and run in a command line like \"python federated_chatglm.py\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch as t\n", - "import os\n", - "from pipeline import fate_torch_hook\n", - "from pipeline.component import HomoNN\n", - "from pipeline.backend.pipeline import PipeLine\n", - "from pipeline.component import Reader\n", - "from pipeline.interface import Data\n", - "from pipeline.runtime.entity import JobParameters\n", - "\n", - "fate_torch_hook(t)\n", - "\n", - "\n", - "guest_0 = 9999\n", - "host_1 = 10000\n", - "pipeline = PipeLine().set_initiator(role='guest', party_id=guest_0).set_roles(guest=guest_0, host=host_1,\n", - " arbiter=guest_0)\n", - "data_guest = {\"name\": \"ad_guest\", \"namespace\": \"experiment\"}\n", - "data_host = {\"name\": \"ad_host\", \"namespace\": \"experiment\"}\n", - "guest_data_path = \"${fate_install}/examples/data/AdvertiseGen/train.json_guest\"\n", - "host_data_path = \"${fate_install}/examples/data/AdvertiseGen/train.json_host\"\n", - "# make sure the guest and host's training data are already binded\n", - "\n", - "reader_0 = Reader(name=\"reader_0\")\n", - "reader_0.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_guest)\n", - "reader_0.get_party_instance(role='host', party_id=host_1).component_param(table=data_host)\n", - "\n", - "## Add your pretriained model path here, will load model&tokenizer from this path\n", - "\n", - "from peft import LoraConfig, TaskType\n", - "lora_config = LoraConfig(\n", - " task_type=TaskType.CAUSAL_LM,\n", - " inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,\n", - " target_modules=['query_key_value'],\n", - ")\n", - "ds_config = {\n", - " \"train_micro_batch_size_per_gpu\": 1,\n", - " \"optimizer\": {\n", - " \"type\": \"Adam\",\n", - " \"params\": {\n", - " \"lr\": 5e-4\n", - " }\n", - " },\n", - " \"fp16\": {\n", - " \"enabled\": True\n", - " },\n", - " \"zero_optimization\": {\n", - " \"stage\": 2,\n", - " \"allgather_partitions\": True,\n", - " \"allgather_bucket_size\": 5e8,\n", - " \"overlap_comm\": False,\n", - " \"reduce_scatter\": True,\n", - " \"reduce_bucket_size\": 5e8,\n", - " \"contiguous_gradients\": True\n", - " }\n", - "}\n", - "\n", - "model_path = \"your download chatglm path\"\n", - "from pipeline.component.homo_nn import DatasetParam, TrainerParam\n", - "model = t.nn.Sequential(\n", - " t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',\n", - " peft_config=lora_config.to_dict(), peft_type='LoraConfig',\n", - " pretrained_path=model_path)\n", - ")\n", - "\n", - "# DatasetParam\n", - "dataset_param = DatasetParam(dataset_name='glm_tokenizer', text_max_length=64, tokenizer_name_or_path=model_path,\n", - " padding_side=\"left\")\n", - "# TrainerParam\n", - "trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=5, batch_size=4, \n", - " checkpoint_save_freqs=1, pin_memory=False, \n", - " task_type=\"seq_2_seq_lm\",\n", - " data_loader_worker=8, \n", - " save_to_local_dir=True, # pay attention to tihs parameter\n", - " collate_fn=\"DataCollatorForSeq2Seq\")\n", - "\n", - "\n", - "nn_component = HomoNN(name='nn_0', model=model , ds_config=ds_config)\n", - "\n", - "# set parameter for client 1\n", - "nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(\n", - " dataset=dataset_param,\n", - " trainer=trainer_param,\n", - " torch_seed=100\n", - ")\n", - "\n", - "# set parameter for client 2\n", - "nn_component.get_party_instance(role='host', party_id=host_1).component_param(\n", - " dataset=dataset_param,\n", - " trainer=trainer_param,\n", - " torch_seed=100\n", - ")\n", - "\n", - "# set parameter for server\n", - "nn_component.get_party_instance(role='arbiter', party_id=guest_0).component_param(\n", - " trainer=trainer_param\n", - ")\n", - "\n", - "pipeline.add_component(reader_0)\n", - "pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))\n", - "pipeline.compile()\n", - "\n", - "pipeline.fit(JobParameters(task_conf={\n", - " \"nn_0\": {\n", - " \"launcher\": \"deepspeed\",\n", - " \"world_size\": 8 # world_size means num of gpus to train in a single client\n", - " }\n", - "}))\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Training With P-Tuning V2 Adapter" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To use another adapter lke P-Tuning V2, slightly changes is needed!" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [], - "source": [ - "from pipeline.component.homo_nn import DatasetParam, TrainerParam\n", - "model = t.nn.Sequential(\n", - " t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',\n", - " pre_seq_len=128, # only this parameters is needed\n", - " pretrained_path=model_path)\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Inference" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Models trained with FATE-LLM can be find under the directory `${fate_install}/fateflow/model/$jobids/$cpn_name/{model.pkl, checkpoint_xxx.pkl/adapter_model.bin}`, users must may sure \"save_to_local_dir=True\". \n", - "The following code is an example to load trained lora adapter weights:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "import sys\n", - "import torch\n", - "from peft import PeftModel, PeftConfig, LoraConfig, TaskType, get_peft_model\n", - "from transformers import AutoModel, AutoTokenizer\n", - "\n", - "\n", - "def load_model(pretrained_model_path):\n", - " _tokenizer = AutoTokenizer.from_pretrained(pretrained_model_path, trust_remote_code=True)\n", - " _model = AutoModel.from_pretrained(pretrained_model_path, trust_remote_code=True)\n", - "\n", - " _model = _model.half()\n", - " _model = _model.eval()\n", - "\n", - " return _model, _tokenizer\n", - "\n", - "\n", - "def load_data(data_path):\n", - " with open(data_path, \"r\") as fin:\n", - " for _l in fin:\n", - " yield json.loads(_l.strip())\n", - "\n", - "chatglm_model_path = \"\"\n", - "model, tokenizer = load_model(chatglm_model_path)\n", - "\n", - "test_data_path = \"{fate_install}/examples/data/AdvertiseGen/dev.json\"\n", - "dataset = load_data(test_data_path)\n", - "\n", - "peft_path = trained_model_path\n", - "peft_config = LoraConfig(\n", - " task_type=TaskType.CAUSAL_LM,\n", - " inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,\n", - " target_modules=['query_key_value'],\n", - ")\n", - "\n", - "model = get_peft_model(model, peft_config)\n", - "model.load_state_dict(torch.load(peft_path), strict=False)\n", - "model = model.half()\n", - "model.eval()\n", - "\n", - "for p in model.parameters():\n", - " if p.requires_grad:\n", - " print(p)\n", - "\n", - "model.cuda(\"cuda:0\")\n", - "\n", - "content = \"advertisement keywords\"\n", - "model.chat(tokenizer, content, do_sample=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.0" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -}