[LoRA] allow loras to be loaded with low_cpu_mem_usage. (#9510)

* allow loras to be loaded with low_cpu_mem_usage. * add flux support but note https://github.com/huggingface/diffusers/pull/9510\#issuecomment-2378316687 * low_cpu_mem_usage. * fix-copies * fix-copies again * tests * _LOW_CPU_MEM_USAGE_DEFAULT_LORA * _peft_version default. * version checks. * version check. * version check. * version check. * require peft 0.13.1. * explicitly specify low_cpu_mem_usage=False. * docs. * transformers version 4.45.2. * update * fix * empty * better name initialize_dummy_state_dict. * doc todos. * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * style * fix-copies --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
huggingface · Oct 9, 2024 · 31058cd · 31058cd
1 parent ec9e526
commit 31058cd
Show file tree

Hide file tree

Showing 5 changed files with 411 additions and 24 deletions.
diff --git a/docs/source/en/tutorials/using_peft_for_inference.md b/docs/source/en/tutorials/using_peft_for_inference.md
@@ -75,6 +75,12 @@ image
 
 ![pixel-art](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/peft_integration/diffusers_peft_lora_inference_12_1.png)
 
+<Tip>
+
+By default, if the most up-to-date versions of PEFT and Transformers are detected, `low_cpu_mem_usage` is set to `True` to speed up the loading time of LoRA checkpoints. 
+
+</Tip>
+
 ## Merge adapters
 
 You can also merge different adapter checkpoints for inference to blend their styles together.