forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring for maintainability #4
Merged
ElizaWszola
merged 97 commits into
neuralmagic:marlin-moe-integration
from
DhruvaBansal00:gptq-marlin-refactor
Aug 22, 2024
Merged
Changes from 92 commits
Commits
Show all changes
97 commits
Select commit
Hold shift + click to select a range
e5c1a81
Refactoring for maintainability
DhruvaBansal00 7da678e
Fixing tests
DhruvaBansal00 641696b
Addressing repacking comment
DhruvaBansal00 3cef667
gptq -> marlin renaming
DhruvaBansal00 a6710af
Undo formatting changes
DhruvaBansal00 e29107f
Final formatting change
DhruvaBansal00 099d61e
Switching to mixtral file for quantized mixtral
DhruvaBansal00 bdf6bdc
Bug fixes
DhruvaBansal00 19c5c59
is quantized change
DhruvaBansal00 3b7cc60
debug stat
DhruvaBansal00 d2c4754
replace wiehgt name with param name
DhruvaBansal00 f579cb2
typo
DhruvaBansal00 79394eb
debug
DhruvaBansal00 ec75f4e
more debug
DhruvaBansal00 91ca970
only relevant logging
DhruvaBansal00 1b9d5bb
log
DhruvaBansal00 ec06719
log
DhruvaBansal00 71d82e1
removing qzero weights
DhruvaBansal00 d3465d0
Qzeors in expert mapping
DhruvaBansal00 226ee26
Debug
DhruvaBansal00 21d7d27
Load qzero
DhruvaBansal00 2dabb4b
rm 2x
DhruvaBansal00 6366976
Mapping for scales
DhruvaBansal00 d63c096
rm logging
DhruvaBansal00 360fef4
Adding lyaer wise logging
DhruvaBansal00 c23d616
shard ids
DhruvaBansal00 8d81d14
Loading qzero correctly
DhruvaBansal00 22e1aa7
List operand
DhruvaBansal00 81e01f3
If clause
DhruvaBansal00 dcfd32d
Able to load layers
DhruvaBansal00 f04cbea
Setting load quant to false
DhruvaBansal00 a56821d
Disabling logging
DhruvaBansal00 7f961c6
Removing *2 in marlin moe repack
DhruvaBansal00 4a6c7ff
*4 in marlin moe repack
DhruvaBansal00 e6cd286
bits
DhruvaBansal00 90241c4
*4
DhruvaBansal00 67409e9
intermediate size
DhruvaBansal00 539032e
repeat keyword
DhruvaBansal00 57b1cbe
hidden size
DhruvaBansal00 87f1dd4
intermediate size back
DhruvaBansal00 4c073c2
permute scales w3
DhruvaBansal00 d732493
*2
DhruvaBansal00 fdc22c4
log
DhruvaBansal00 272822e
shape as 2
DhruvaBansal00 3ce045e
test
DhruvaBansal00 c4ba477
Increasing to 4 and changing assert
DhruvaBansal00 2ea8370
logging
DhruvaBansal00 8287025
marlin moe repack change
DhruvaBansal00 53b23b9
mult qweight shape by pack factor
DhruvaBansal00 bc40786
Potential support for 8 bit
DhruvaBansal00 bea13de
undo change
DhruvaBansal00 a3a9114
qzeros
DhruvaBansal00 eb916f9
switching traffic to mixtral quant
DhruvaBansal00 017d6f8
compat
DhruvaBansal00 eb9c087
Passing intermediate tensor into mixtral in quant file
DhruvaBansal00 ea3cf18
Removing intemediate tensors from forward
DhruvaBansal00 4f6b4ca
load weights from quant
DhruvaBansal00 7ec27d9
Mixtral load weights change:
DhruvaBansal00 aa1fe77
none shard id change
DhruvaBansal00 ae8fb15
Use class from mixtral_quant
DhruvaBansal00 b863981
Removing lora from mixtral model init
DhruvaBansal00 5556d28
Adding empty intermediate tensors
DhruvaBansal00 c484a37
Building quantMixtralModel
DhruvaBansal00 0344e72
fused moe test
DhruvaBansal00 8c8b3fa
Lora enabled mixtral
DhruvaBansal00 dff59cd
LoRAMixtralModel compat
DhruvaBansal00 33f7e51
remove prefix
DhruvaBansal00 fdba917
use fused moe
DhruvaBansal00 780471e
remove org num embeddings
DhruvaBansal00 c0970f1
pass use fused moe into decoder
DhruvaBansal00 6a1a838
Mixtral for causal lm load func
DhruvaBansal00 5c3e857
Copying over quant mixtral
DhruvaBansal00 8d327de
Passing prefix
DhruvaBansal00 d337aea
Weight load
DhruvaBansal00 379f3e8
Weight load back
DhruvaBansal00 a5d356e
Load with name not weight name
DhruvaBansal00 62c0135
params dict should load from old name
DhruvaBansal00 d23c00c
logging name and parmas
DhruvaBansal00 6dda447
log expert parmas map
DhruvaBansal00 67ce7b6
parity with prev commits
DhruvaBansal00 bd933c9
Adding qzeros to mapping
DhruvaBansal00 77cd095
Remove log
DhruvaBansal00 529191e
Remove is quantized
DhruvaBansal00 2450543
Assume fused true
DhruvaBansal00 8cba45e
rm fused true
DhruvaBansal00 10940a5
Switching to mixtral moe
DhruvaBansal00 895ffbe
Precision changes
DhruvaBansal00 e54b2e4
Cleanup
DhruvaBansal00 b4f23dc
Mixtral quant parity:
DhruvaBansal00 d59fe3b
fixing tests
DhruvaBansal00 0d9cbdc
Tests working and correctness verified
DhruvaBansal00 112aa40
Formating
DhruvaBansal00 1ca9098
Moving single marlin alongside fused marlin
DhruvaBansal00 4d41425
Removing unused imports
DhruvaBansal00 4907f43
single marlin moe import
DhruvaBansal00 8225037
Merge branch 'marlin-moe-integration' into gptq-marlin-refactor
ElizaWszola 315e3b6
Unify shard_id to be of str w[1-3] format
ElizaWszola File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it would be good to keep
single_marlin_moe
in the same place asfused_moe_marlin
, even if the former is only used for testing