Skip to content

Commit

Permalink
fix: fast tokenizer conversion should happen offline (#106)
Browse files Browse the repository at this point in the history
#### Motivation

The server is launched with `HF_HUB_OFFLINE=1` and is meant to treat
model files as read-only; however, the fast tokenizer conversion
happening in the `launcher` does not follow this (if a `revision` is not
passed). This can cause problems if a model in HF Hub is updated and the
tokenizer conversion downloads the tokenizer files for the new commit of
the model but then the server doesn't download the new model files...
the server fails to load because it can't find the model files.

#### Modifications

- Set `local_files_only=True` with and without the revision arg when
doing the fast tokenizer conversion
- Set `HF_HUB_OFFLINE=1` in the env as well for good measure
- Little refactoring to have the command building be shared

#### Result

Fast tokenizer conversion in the launcher should never download new
files.

#### Related Issues

- Fast tokenizer conversion added in
#48
- Setting `local_files_only` if `revision` is passed:
#63

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
  • Loading branch information
tjohnson31415 authored Jul 31, 2024
1 parent 5b5938e commit 572e03f
Showing 1 changed file with 16 additions and 11 deletions.
27 changes: 16 additions & 11 deletions launcher/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -870,19 +870,24 @@ fn save_fast_tokenizer(
info!("Saving fast tokenizer for `{model_name}` to `{save_path}`");
let model_name = model_name.escape_default();
let revision = revision.map(|v| v.escape_default());
let code = if let Some(revision) = revision {
format!(
"from transformers import AutoTokenizer; \
AutoTokenizer.from_pretrained(\"{model_name}\", \
revision=\"{revision}\", local_files_only=True).save_pretrained(\"{save_path}\")"
)
let revision_arg = if let Some(revision) = revision {
format!("revision=\"{revision}\", ")
} else {
format!(
"from transformers import AutoTokenizer; \
AutoTokenizer.from_pretrained(\"{model_name}\").save_pretrained(\"{save_path}\")"
)
"".to_string()
};
match Command::new("python").args(["-c", &code]).status() {
let code = format!(
"from transformers import AutoTokenizer; \
AutoTokenizer.from_pretrained( \
\"{model_name}\", \
{revision_arg} \
local_files_only=True \
).save_pretrained(\"{save_path}\")"
);
match Command::new("python")
.args(["-c", &code])
.env("HF_HUB_OFFLINE", "1")
.status()
{
Ok(status) => {
if status.success() {
Ok(())
Expand Down

0 comments on commit 572e03f

Please sign in to comment.