[V1] Various updates #28

njhill · 2024-11-06T20:47:32Z

See inline comments

njhill · 2024-11-06T20:49:31Z

vllm/v1/engine/__init__.py

+                       omit_defaults=True,
+                       gc=False):


These also have performance benefit

njhill · 2024-11-06T20:51:39Z

vllm/v1/engine/async_llm.py

+    def _finish_stream(self, request_id: str):
+        stream = self.request_streams.pop(request_id)
+        if stream is not None:
+            stream.finish()


We need to remove the stream if it finishes due to a finished request output or due to client cancellation.

njhill · 2024-11-06T20:52:38Z

vllm/v1/engine/async_llm.py

            await self.engine_core.abort_requests_async(request_ids)
-            self.detokenizer.abort_requests(request_ids)


This shouldn't be needed because the detokenizer removes finished requests itself.

njhill · 2024-11-06T20:53:47Z

vllm/v1/engine/core.py

+                encoder.encode_into(outputs, buffer)
+                socket.send_multipart((buffer, ),


This will reuse the buffer rather than allocating one each time.

njhill force-pushed the v1/updates branch from f9a0f75 to 6998ef1 Compare November 6, 2024 20:48

njhill commented Nov 6, 2024

View reviewed changes

njhill requested a review from robertgshaw2-redhat November 6, 2024 20:54

neuralmagic deleted a comment from github-actions bot Nov 6, 2024

Various updates

e5414b4

njhill force-pushed the v1/updates branch from 6998ef1 to e5414b4 Compare November 6, 2024 21:42

robertgshaw2-redhat approved these changes Nov 6, 2024

View reviewed changes

robertgshaw2-redhat merged commit 1591996 into neuralmagic:rework-rs-proto Nov 6, 2024

njhill deleted the v1/updates branch November 6, 2024 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] Various updates #28

[V1] Various updates #28

njhill commented Nov 6, 2024

njhill Nov 6, 2024

njhill Nov 6, 2024

njhill Nov 6, 2024

njhill Nov 6, 2024

		await self.engine_core.abort_requests_async(request_ids)
		self.detokenizer.abort_requests(request_ids)

		encoder.encode_into(outputs, buffer)
		socket.send_multipart((buffer, ),

[V1] Various updates #28

[V1] Various updates #28

Conversation

njhill commented Nov 6, 2024

njhill Nov 6, 2024

Choose a reason for hiding this comment

njhill Nov 6, 2024

Choose a reason for hiding this comment

njhill Nov 6, 2024

Choose a reason for hiding this comment

njhill Nov 6, 2024

Choose a reason for hiding this comment