Fix pipeline handling in the copilot provider #547

jhrozek · 2025-01-10T16:11:31Z

Create the pipeline sensitive context when creating a pipeline instance, not on every processing
Create pipeline instance when creating the SequentialPipelineProcessor not for every process
Create the pipelines only once in the copilot provider
test me: is this needed

Fixes: #528

ptelang · 2025-01-10T18:18:10Z

src/codegate/pipeline/base.py

@@ -275,6 +275,13 @@ def __init__(
        self.secret_manager = secret_manager
        self.is_fim = is_fim
        self.context = PipelineContext()
+
+        # we create the sesitive context here so that it is not shared between individual requests
+        # TODO: could we get away with just generating the session ID for an instance?


Assuming each request has exactly one instance of the pipeline and each pipeline has one instance of PipelineSensitiveData, the session ID can be generated for the pipeline instance and used in PipelineSensitiveData.

yes, let's fix this in a follow-up, I will raise an issue

ptelang · 2025-01-10T18:23:21Z

src/codegate/providers/copilot/provider.py

        self.context_tracking: Optional[PipelineContext] = None

+    def _ensure_pipelines(self):
+        if not self.input_pipeline or not self.fim_pipeline:


Do we need to instantiate both the pipelines here, or can we just create one conditionally based on if the request is chat or fim?

we could, I'll add that to the issue I'll open

ptelang · 2025-01-10T18:28:26Z

src/codegate/providers/copilot/provider.py

@@ -288,6 +295,8 @@ async def _forward_data_through_pipeline(self, data: bytes) -> Union[HttpRequest
            http_request.headers,
            http_request.body,
        )
+        # TODO: it's weird that we're overwriting the context. Should we set the context once? Maybe when
+        # creating the pipeline instance?


yes, that makes sense.

jhrozek · 2025-01-12T22:43:18Z

No longer a draft and can be tested/reviewed, but only now did I notice @ptelang 's comments (sorry I was focused on getting the damn thing working first).
I'll fix them first thing in the morning.

…ce, not on every processing We used to create the pipeline context during pipeline processing which means we cannot reuse the same pipeline for output that spans several data buffers.

…r not for every process A pipeline instance is what binds the pipeline steps with the context. Create the instance sooner, not when processing the request.

Since the copilot provider class instance is created once per connection, let's create the pipelines when establishing the connection and reuse them.

…tput pipeline Since we can reuse a single pipeline for multiple request-reply round trips, we shouldn't flush the buffer and destroy the context.

We've seen instances where the request (typically a FIM one) contained more than one request. Let's dispatch them one by one individually. Also let's not pass around self.buffer into the tasks but a parameter.

For some reason we coded up the SecretsManager so that it held only one secret per session. Let's store a dict instead.

ptelang reviewed Jan 10, 2025

View reviewed changes

jhrozek force-pushed the unredact_bug branch from f2b0758 to 5601d63 Compare January 12, 2025 22:29

jhrozek marked this pull request as ready for review January 12, 2025 22:42

jhrozek changed the title ~~WIP: Fix pipeline handling in the copilot provider~~ Fix pipeline handling in the copilot provider Jan 13, 2025

lukehinds previously approved these changes Jan 13, 2025

View reviewed changes

jhrozek added 6 commits January 13, 2025 11:47

Create the pipeline sensitive context when creating a pipeline instan…

4f97215

…ce, not on every processing We used to create the pipeline context during pipeline processing which means we cannot reuse the same pipeline for output that spans several data buffers.

Create pipeline instance when creating the SequentialPipelineProcesso…

8dbff63

…r not for every process A pipeline instance is what binds the pipeline steps with the context. Create the instance sooner, not when processing the request.

Create the pipelines only once in the copilot provider

606da49

Since the copilot provider class instance is created once per connection, let's create the pipelines when establishing the connection and reuse them.

Don't flush the buffer and the sensitive data if we're reusing the ou…

5b621a5

…tput pipeline Since we can reuse a single pipeline for multiple request-reply round trips, we shouldn't flush the buffer and destroy the context.

Handle multiple requests in one data_received call

3e4790d

We've seen instances where the request (typically a FIM one) contained more than one request. Let's dispatch them one by one individually. Also let's not pass around self.buffer into the tasks but a parameter.

Initialize pipelines per instance in the base providers, too

30e56e3

jhrozek dismissed lukehinds’s stale review via 30e56e3 January 13, 2025 10:51

jhrozek force-pushed the unredact_bug branch from 5601d63 to 30e56e3 Compare January 13, 2025 10:51

Fix SecretsManager

cb9540a

For some reason we coded up the SecretsManager so that it held only one secret per session. Let's store a dict instead.

lukehinds approved these changes Jan 13, 2025

View reviewed changes

lukehinds merged commit c68ef71 into stacklok:main Jan 13, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pipeline handling in the copilot provider #547

Fix pipeline handling in the copilot provider #547

jhrozek commented Jan 10, 2025 •

edited

Loading

ptelang Jan 10, 2025

jhrozek Jan 13, 2025

ptelang Jan 10, 2025

jhrozek Jan 13, 2025

ptelang Jan 10, 2025

jhrozek commented Jan 12, 2025

Fix pipeline handling in the copilot provider #547

Fix pipeline handling in the copilot provider #547

Conversation

jhrozek commented Jan 10, 2025 • edited Loading

ptelang Jan 10, 2025

Choose a reason for hiding this comment

jhrozek Jan 13, 2025

Choose a reason for hiding this comment

ptelang Jan 10, 2025

Choose a reason for hiding this comment

jhrozek Jan 13, 2025

Choose a reason for hiding this comment

ptelang Jan 10, 2025

Choose a reason for hiding this comment

jhrozek commented Jan 12, 2025

jhrozek commented Jan 10, 2025 •

edited

Loading