Spacy Inference speed for single text #11554
marzooq-unbxd
started this conversation in
Help: Best practices
Replies: 1 comment 2 replies
-
The easiest ways to improve processing speed are to either batch requests (reducing setup/teardown costs) or to do less work (using a smaller architecture). If your API doesn't allow batching requests externally, you can batch them internally. A simple way to do that would be to have a separate thread with a queue that handles batching and calling |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I was trying to deploy my spacy NER model to a Flask web-app with a gunicorn layer.
I read about improving the inference speed , by moving from
nlp(text)
tonlp.pipe([list_of_texts])
, But in order for it to re-use resources , shouldnt list_of_texts be len()>1.I send NER entities for every single request that comes to my service(len=1).I was having high latency just by increasing rps.Changing n_process wouldnt help according to #10087 .Changing n_threads isnt an option anymore, (doesnt remove GIL lock?)
Is the only way to solve this by reducing model arhcitecture size?
Beta Was this translation helpful? Give feedback.
All reactions