Skip to content

Commit

Permalink
update fluid kserve sample to use huggingface servingruntime (kserve#…
Browse files Browse the repository at this point in the history
…3907)

* update fluid kserve demo to use huggingface servingruntime and other model.

Signed-off-by: Lize Cai <lize.cai@sap.com>

* fix lint

Signed-off-by: Lize Cai <lize.cai@sap.com>

* explicitly set custom servingruntime, update devshm.

Signed-off-by: Lize Cai <lize.cai@sap.com>

* update the args in custom kserve hfserver

Signed-off-by: Lize Cai <lize.cai@sap.com>

* address comments

Signed-off-by: Lize Cai <lize.cai@sap.com>

* add return of line

Signed-off-by: Lize Cai <lize.cai@sap.com>

---------

Signed-off-by: Lize Cai <lize.cai@sap.com>
  • Loading branch information
lizzzcai authored Sep 23, 2024
1 parent bf6fae8 commit 2376eeb
Show file tree
Hide file tree
Showing 12 changed files with 252 additions and 356 deletions.
224 changes: 83 additions & 141 deletions docs/samples/fluid/README.md

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions docs/samples/fluid/alluxio.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ metadata:
name: s3-data
spec:
mounts:
- mountPoint: "s3://${bucket}/models/bloom-560m/"
name: bloom-560m
path: /bloom-560m
- mountPoint: "s3://${bucket}/models/meta-llama--Meta-Llama-3.1-8B-Instruct/"
name: llama-31-8b-instruct
options:
alluxio.underfs.s3.region: "eu-central-1"
alluxio.underfs.s3.secure.http.enabled: "true"
Expand All @@ -30,15 +29,15 @@ spec:
- key: node.kubernetes.io/instance-type
operator: In
values:
- "m5.xlarge"
- "m5n.xlarge"
placement: "Shared"
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
name: s3-data
spec:
replicas: 2
replicas: 3
properties:
alluxio.user.ufs.block.read.location.policy: alluxio.client.block.policy.LocalFirstAvoidEvictionPolicy
alluxio.user.block.size.bytes.default: 256MB
Expand All @@ -50,11 +49,12 @@ spec:
levels:
- mediumtype: SSD
path: /mnt/ssd0/cache
quota: 10Gi
quota: 100Gi
high: "0.95"
low: "0.7"
master:
nodeSelector:
node.kubernetes.io/instance-type: m5.xlarge
fuse:
cleanPolicy: OnDemand
node.kubernetes.io/instance-type: m5n.xlarge
worker:
nodeSelector:
node.kubernetes.io/instance-type: m5n.xlarge
5 changes: 2 additions & 3 deletions docs/samples/fluid/dataload.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,5 @@ spec:
namespace: kserve-fluid-demo
target:
# # please update it accordingly
- path: /bloom-560m
# - path: /bloom-7b1
replicas: 2
- path: /llama-31-8b-instruct
replicas: 3
17 changes: 0 additions & 17 deletions docs/samples/fluid/docker/Dockerfile

This file was deleted.

104 changes: 0 additions & 104 deletions docs/samples/fluid/docker/models.py

This file was deleted.

6 changes: 0 additions & 6 deletions docs/samples/fluid/docker/requirements.txt

This file was deleted.

39 changes: 30 additions & 9 deletions docs/samples/fluid/download_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,40 @@
help="model name from huggingface",
)
parser.add_argument(
"--model_dir", default="models", help="dir to download the model"
"--model_dir",
default="./models",
help="dir to download the model",
)
parser.add_argument(
"--revision",
default="main",
help="revision of the model",
)
parser.add_argument("--revision", default="main", help="revision of the model")
args = vars(parser.parse_args())

args = vars(parser.parse_args())
model_name = args["model_name"]
revision = args["revision"]
out_dir = args["model_dir"]

model_dir = Path(args["model_dir"])
model_dir.mkdir(exist_ok=True)
tmp_model_name = model_name.replace("/", "--")

snapshot_download(repo_id=model_name, revision=revision, cache_dir=model_dir)
model_dir = Path(out_dir, f"models--{tmp_model_name}", "snapshots", revision)

# reference: https://aws.amazon.com/de/blogs/machine-learning/deploy-bloom-176b-and-opt-30b-on-amazon-sagemaker-with-large-model-inference-deep-learning-containers-and-deepspeed/ # noqa: E501
output_dir = list(model_dir.glob("**/snapshots/*"))[0]
print(f"export output_dir={output_dir}")
# check the model repo and update it accordingly
allow_patterns = ["*.json", "*.safetensors", "*.model"]
# here safetensors is the preferred format.
ignore_patterns = ["*.msgpack", "*.h5", "*.bin"]

# set the path to download the model
models_path = Path(model_dir)
models_path.mkdir(parents=True, exist_ok=True)

# download the snapshot
output_dir = snapshot_download(
repo_id=model_name,
revision=revision,
allow_patterns=allow_patterns,
ignore_patterns=ignore_patterns,
local_dir=models_path,
)
print(output_dir)
54 changes: 25 additions & 29 deletions docs/samples/fluid/fluid-isvc.yaml
Original file line number Diff line number Diff line change
@@ -1,34 +1,30 @@
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: "fluid-bloom"
labels:
serverless.fluid.io/inject: "true"
name: llama-31-8b-instruct
spec:
predictor:
terminationGracePeriodSeconds: 60
timeout: 600
minReplicas: 0
nodeSelector:
node.kubernetes.io/instance-type: m5.4xlarge
containers:
- name: kserve-container
image: lizzzcai/kserve-fluid:bloom-gpu-v1
# # below are for running bloom-7b1 using cpu
# resources:
# limits:
# cpu: "12"
# memory: 48Gi
# requests:
# cpu: "12"
# memory: 48Gi
env:
- name: STORAGE_URI
# please update it accordingly
value: "pvc://s3-data/bloom-560m"
# value: "pvc://s3-data/bloom-7b1"
- name: MODEL_NAME
value: "bloom"
# set to "True" if you are using GPU, update the resources as well
- name: GPU_ENABLED
value: "False"
node.kubernetes.io/instance-type: g5.8xlarge
model:
runtime: custom-kserve-huggingfaceserver
modelFormat:
name: huggingface
storageUri: pvc://s3-data/llama-31-8b-instruct
args:
- --gpu-memory-utilization=0.95
- --max-model-len=1024
- --tensor-parallel-size=1
- --enforce-eager
- --disable-log-stats
- --disable-log-requests
resources:
limits:
cpu: "24"
memory: 48Gi
nvidia.com/gpu: "1"
requests:
cpu: "24"
memory: 48Gi
nvidia.com/gpu: "1"
17 changes: 9 additions & 8 deletions docs/samples/fluid/jindo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@ metadata:
name: s3-data
spec:
mounts:
- mountPoint: "s3://${bucket}/models/bloom-560m/"
name: bloom-560m
path: /bloom-560m
- mountPoint: "s3://${bucket}/models/meta-llama--Meta-Llama-3.1-8B-Instruct/"
name: llama-31-8b-instruct
options:
fs.s3.region: "eu-central-1"
fs.s3.endpoint: "s3.eu-central-1.amazonaws.com"
Expand All @@ -30,25 +29,28 @@ spec:
- key: node.kubernetes.io/instance-type
operator: In
values:
- "m5.xlarge"
- "m5n.xlarge"
placement: "Shared"
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: s3-data
spec:
replicas: 2
replicas: 3
tieredstore:
levels:
- mediumtype: SSD
path: /mnt/ssd0/cache
quota: 50Gi
quota: 100Gi
high: "0.95"
low: "0.7"
master:
nodeSelector:
node.kubernetes.io/instance-type: m5.xlarge
node.kubernetes.io/instance-type: m5n.xlarge
worker:
nodeSelector:
node.kubernetes.io/instance-type: m5n.xlarge
fuse:
properties:
fs.jindofsx.data.cache.enable: "true"
Expand All @@ -63,4 +65,3 @@ spec:
- -oattr_timeout=7200
- -oentry_timeout=7200
- -ometrics_port=9089
cleanPolicy: OnDemand
Loading

0 comments on commit 2376eeb

Please sign in to comment.