Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem downloading huggingface models. #19

Open
etymotic opened this issue Jan 1, 2025 · 5 comments
Open

Problem downloading huggingface models. #19

etymotic opened this issue Jan 1, 2025 · 5 comments

Comments

@etymotic
Copy link

etymotic commented Jan 1, 2025

After updating to the most recent version of the docker container, the container constantly crashes/restarts/crashes/restarts. Logs as follows:

2025-01-01 02:52:07.128 | INFO     | horde_safety:<module>:18 - AIWORKER_CACHE_HOME not set, using default huggingface cache paths.
Loading CLIP model ViT-L-14/openai...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/open_clip/pretrained.py", line 747, in download_pretrained_from_hf
    cached_file = hf_hub_download(
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 860, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1020, in _hf_hub_download_to_cache_dir
    _create_symlink(blob_path, pointer_path, new_blob=True)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 584, in _create_symlink
    os.symlink(src_rel_or_abs, abs_dst)
FileNotFoundError: [Errno 2] No such file or directory: '../../blobs/c4e2d8e8f1a5bc97c76b9c9385d2fc2a1acd38846db5468f579fda12306a218b' -> '/root/.cache/huggingface/models--timm--vit_large_patch14_clip_224.openai/snapshots/689a4528f64ee8a01e0710a91fa2c70793428860/open_clip_pytorch_model.bin'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/lemmy_safety_object_storage.py", line 15, in <module>
    from fedi_safety.check import check_image
  File "/app/fedi_safety/check.py", line 9, in <module>
    interrogator = get_interrogator_no_blip()
  File "/usr/local/lib/python3.10/site-packages/horde_safety/interrogate.py", line 106, in get_interrogator_no_blip
    interrogator = Interrogator(
  File "/usr/local/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py", line 71, in __init__
    self.load_clip_model()
  File "/usr/local/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py", line 105, in load_clip_model
    self.clip_model, _, self.clip_preprocess = open_clip.create_model_and_transforms(
  File "/usr/local/lib/python3.10/site-packages/open_clip/factory.py", line 484, in create_model_and_transforms
    model = create_model(
  File "/usr/local/lib/python3.10/site-packages/open_clip/factory.py", line 367, in create_model
    checkpoint_path = download_pretrained(pretrained_cfg, cache_dir=cache_dir)
  File "/usr/local/lib/python3.10/site-packages/open_clip/pretrained.py", line 785, in download_pretrained
    target = download_pretrained_from_hf(model_id, cache_dir=cache_dir)
  File "/usr/local/lib/python3.10/site-packages/open_clip/pretrained.py", line 755, in download_pretrained_from_hf
    raise FileNotFoundError(f"Failed to download file ({filename}) for {model_id}. Last error: {e}")
FileNotFoundError: Failed to download file (open_clip_pytorch_model.bin) for timm/vit_large_patch14_clip_224.openai. Last error: [Errno 2] No such file or directory: '../../blobs/c4e2d8e8f1a5bc97c76b9c9385d2fc2a1acd38846db5468f579fda12306a218b' -> '/root/.cache/huggingface/models--timm--vit_large_patch14_clip_224.openai/snapshots/689a4528f64ee8a01e0710a91fa2c70793428860/open_clip_pytorch_model.bin'

I'm not sure how to directly access the container since it doesn't run for very long. I can usually access containers using something like docker exec -it "container name" /bin/bash, but it crashes almost immediately.

I've been able to run this using CPU and no GPU, so maybe something changed on my host machine, too.

@db0
Copy link
Owner

db0 commented Jan 1, 2025

Unfortunately I'm not using the docker container and I haven't built this part so I'm not quite sure why it stopped downloading the models here. That part didn't really change.

@tazlin
Copy link

tazlin commented Jan 1, 2025

This problem is probably to do with recent changes to the huggingface libraries and the fact that the container run as the root user (i.e., /root/.cache isn't analogous to a standard user XDG_CACHE_HOME)

I suggest trying setting TRANSFORMERS_CACHE as an environment variable to a directory such as /app/cache and seeing if that fixes the issue (see also the huggingface docs on this subject). If it does fix the issue, the Dockerfile can be adjusted accordingly.

@etymotic
Copy link
Author

etymotic commented Jan 2, 2025

Thanks, that helped a lot. Instead of TRANSFORMERS_CACHE though, I had to set AIWORKER_CACHE_HOME, and now I'm back in business. I set it to the directory you suggested, /app/cache.

@db0
Copy link
Owner

db0 commented Jan 2, 2025

OK cool, I'll see if I can adjust the examples and the doc

@etymotic
Copy link
Author

etymotic commented Jan 2, 2025

OK cool, I'll see if I can adjust the examples and the doc

Sounds good. Sorry I can't help, this stuff is over my head.

Another thing to note is while things are working, the logs say FutureWarning: Using TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOME instead. I did try and set HF_HOME instead of the working AIWORKER_CACHE_HOME and it didn't seem to do anything. Maybe I'll try setting both variables and see what happens.

Edit: I set both AIWORKER_CACHE_HOME and HF_HOME and still get the same warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants