Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Experiment Settings #1

Open
soeun-22 opened this issue Jan 7, 2025 · 3 comments
Open

Question about Experiment Settings #1

soeun-22 opened this issue Jan 7, 2025 · 3 comments

Comments

@soeun-22
Copy link

soeun-22 commented Jan 7, 2025

Hello,
Thank you for providing such an excellent paper and code! I truly appreciate your contributions.

I’ve been running experiments using your code and encountered a question regarding the experimental settings. Specifically, my Wikitext2 PPL results seem to differ from the results reported in Table 1.

I conducted the experiment using the following settings:

  • Model: LLaMA2-7B
  • Rank: 256
  • BL, BR: 4bit
  • Outer iterations: 15
  • Inner iterations: 10

With these settings, I obtained a PPL of '6.4685444831848145', which is higher than the reported results in Table 1.

I would like to ask:

  1. Could you provide more details on the exact configurations or hyperparameters used to achieve the results in Table 1?
  2. Regarding the Random Hadamard Matrix generation, it seems to be created randomly based on the seed value. Could you share the specific seed values used for each experiment?

For clarity, I have attached a screenshot of my experimental setup.
image

Thank you again for this remarkable project and for your support.
I look forward to your guidance and hope you have a great day!

@NSagan271
Copy link
Collaborator

NSagan271 commented Jan 15, 2025

Hi, thank you for the question! Can you try setting Q_hessian_downdate to true and increasing the number of iterations to 20 outer iterations and 50 inner iterations (as the time bottleneck is the quantization of Q, increasing the number of inner iterations is reasonable)? Also, for the Hessian matrices, try using these ones from QuIP# if you are not already (QuIP# Hessians have been computed with a large calibration dataset).

Regarding the random seeds, we found that the impact of the random seed is minimal due the relatively high dimensions of the matrices and iterative nature of the algorithm (and especially so if finetuning is performed over the diagonal matrices of the randomized Hadamard transform, though that is an optional step).

Also, in case you are interested in using the CALDERA-quantized version of LLaMa-2-7B that we computed, you can now find it here on Huggingface. This checkpoint has been obtained with the above configuration, and achieves the reported PPL.

Edit: 15 outer iterations should work, as long as the number of inner iterations is 50.

@soeun-22
Copy link
Author

I will proceed with the experiments based on the settings you kindly provided.
Thank you so much for your thoughtful and detailed response—I truly appreciate it.

@rajarshisaha95
Copy link
Collaborator

Hi @soeun-22 did the above configuration help? If so, please feel free to resolve the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants