Skip to content

Latest commit

 

History

History
41 lines (30 loc) · 2.43 KB

README.md

File metadata and controls

41 lines (30 loc) · 2.43 KB

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

This is the code base for On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

We propose a novel Collaborative learning approach with a Mixture of Generalists and Specialists (CoMiGS), which successfully addresses both system heterogeneity and data heterogeneity on device. As in the following plot, we follow a MoE architecture, while letting the experts diversify into generalists and specialists via parameter aggregation or localization, to leverage both collective power and specialized knowledge. A key innovation of our method is the bi-level optimization formulation of the Mixture-of-Experts learning objective, which follows the hierarchical order and router and experts learning.

Our method achieves the finest-grained balance of personalizetion and collaboration, as witnessed by the Top1 expert choice from the 1st and last layer. (Orange - generalist, blue - Specialist)

Methods implemented

Our code repository is built up on nanoGPT and nanoGPT-LoRA. We implement the following baselines:

  • Local Fine-Tuning
  • FedAvg Fine-Tuning
  • FlexLoRA by Jiamu Bai et al.
  • HetLoRA by Yae Jee Cho et al.
  • FFA-LoRA by Youbang Sun et al.
  • Strategy 2 of PCL by Nicolas Wagner et al.
  • An adapted version of pFedMoE by Liping Yi et al.
  • Our CoMiGS method

Structure

Collab_runscripts contains the experiment configurations used for the experiments in the paper. These serve as examples on how to configure the runs for collab_run.py experiments.

New methods can be implemented by extending collab_utils/collaborations_strategies.py and collab_utils/aggregation_strategies.py.