Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization policy consider 304 as an error leading to a telemetry exception being recorded #32454

Open
1 of 6 tasks
Apokalypt opened this issue Jan 7, 2025 · 10 comments
Open
1 of 6 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@Apokalypt
Copy link

  • Package Name: core-client
  • Package Version: 1.9.2
  • Operating system: Windows 10 + Darwin 24.1
  • nodejs
    • version: 18.20.4
  • browser
    • name/version:
  • typescript
    • version:
  • Is the bug related to documentation in

Describe the bug
The deserialization policy implemented in @azure/core-client considers any request with a status code not between 200 and 300 as an error which seems to be a good option. However, due to this behaviour each time an API answers 304 we are seing in Application Insights an exception which is not an expected behavior as a 304 is clearly expected

To Reproduce

  1. Setup instrumentation with @azure/monitor-opentelemetry and register azure SDK instrumentation
  2. Try to download a file from blob storage by providing the condition ifNoneMatch in order to receive 304
  3. Look at

Expected behavior
HTTP calls with status code equals to 304 should not generate a telemetry exception

Screenshots

  • Screenshot from Application insights showing the exception telemetry with the 304 status code
    Image
  • Screenshot of the deserialization policy
Image

Additional context
N/A

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 7, 2025
@xirzec
Copy link
Member

xirzec commented Jan 7, 2025

My understanding is that 304 gets modeled as an error since the spec for downloadBlob only specifies responses for 200 and 206: https://github.com/Azure/azure-rest-api-specs/blob/b478dcdabacb37945ff89daa0eda1ae1afc98db4/specification/storage/data-plane/Microsoft.BlobStorage/stable/2025-01-05/blob.json#L2698

Since the default response is a StorageError for all unspecified status codes, the deserialization policy throws a RestError. Because the client doesn't handle this, it forwards the error to the consumer.

@xirzec xirzec closed this as completed Jan 7, 2025
@xirzec xirzec reopened this Jan 7, 2025
@xirzec
Copy link
Member

xirzec commented Jan 7, 2025

Sorry, GitHub helpfully sent my response while I was in the middle of typing it. I don't think we can change the API contract for Storage methods that use conditional requests, but I am curious how AppInsights is hooking this, if it's on exceptions being created/thrown that's going to cause problems whenever exceptions are used for control flow

@xirzec xirzec added the Monitor - Exporter Monitor OpenTelemetry Exporter label Jan 7, 2025
@xirzec xirzec added the Client This issue points to a problem in the data-plane of the library. label Jan 7, 2025
@github-actions github-actions bot removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 7, 2025
@JacksonWeber
Copy link
Member

JacksonWeber commented Jan 7, 2025

@xirzec @Apokalypt If a REST error is thrown and bubbles up to the top level of the application, I'd expect OpenTelemetry to create an exception record that we then export to Application Insights the same way as any unhandled exception. This seems like it'd be expected in the event of a "real" REST error. So apart for some special logic for 304s, I'm not sure if this would be considered "unexpected" behavior. Should be solvable by catching 304s on the client side for now, but strange to have that response code count as an "exception" at all.

@Apokalypt
Copy link
Author

Apokalypt commented Jan 8, 2025

@JacksonWeber when you are talking about "client side" is it on Azure's SDK code using your internal library or on my code when I use the library ?

@JacksonWeber
Copy link
Member

@Apokalypt Just as an interim solution to avoid the unwanted exception telemetry, I was suggesting implementing a catch statement here in your code.

@Apokalypt
Copy link
Author

Apokalypt commented Jan 9, 2025

@Apokalypt Just as an interim solution to avoid the unwanted exception telemetry, I was suggesting implementing a catch statement here in your code.

This is exactly what is done, but in fact the exception is added to the span from the library code itself (before I can catch it).
For the moment, we've added a SpanProcessor to remove manually the “exception” event when the span is finished, but this isn't really a viable solution.

To add more detail to what's going on, because I was able to dig a little today ⤵️
When we call <BlobClient>.download the internal download call is done inside a custom span ( code ). However, the internal download call ( code ) throws an error (due to status code being 304) without being caught. As a result, the withSpan method, that created the context for this internal call, considers that an unintentional error has been raised and must therefore add the exception to the created span ( code )

@xirzec
Copy link
Member

xirzec commented Jan 10, 2025

@maorleger Any thoughts on this? I think we ended up in this place because Storage didn't originally have support for conditional requests and when they added it after the fact, we didn't have a non-breaking way to describe the result without throwing an exception.

@maorleger
Copy link
Member

maorleger commented Jan 13, 2025

@maorleger Any thoughts on this? I think we ended up in this place because Storage didn't originally have support for conditional requests and when they added it after the fact, we didn't have a non-breaking way to describe the result without throwing an exception.

I don't have a lot of context for the history here but if I understand correctly:

  • If-None-Modified => possible 304 as an expected response
  • deserializationPolicy converts 304 to a thrown exception
  • core-tracing records exception on the span
  • storage-blob's SDK bubbles up the 304 to the client code

Is that right?

If so (and please correct anything I was mistaken on) I can think of a few options (just brainstorming any and all ideas no matter how silly):

  1. If the spec can be updated to add 304 as an expected response then deserializationPolicy will no longer throw I believe
  2. core-tracing could be updated to skip recording 304s as exceptions
  3. core-tracing could provide additional options to tune this behavior (skip errors or some flag)
  4. a custom pipeline policy could be added avoid rethrowing 304s as exceptions
  5. the client code in storage-blob SDK can catch the exception and avoid bubbling it up
  6. the client code in storage-blob SDK can use the startSpan escape hatch to manage the span lifecycle itself

Out of these I think:

  1. ❓ Would that be considered a breaking change? And is a "breaking change" to the swagger acceptable when it helps more accurately describe the service and the reality?
  2. ❌ I would rather avoid special casing in core-tracing - if an exception comes in it gets recorded
  3. ❌ Again this feels orthogonal to core-tracing's purpose. If an unhandled exception bubbles up to core-tracing's withSpan handler it should be recorded
  4. ❓I am not confident that would actually work, we'd need to spike out what that might look like
  5. ❌ Sounds like a breaking change in our SDK public API so I would assume this is a non-starter
  6. ✅ Feasible and would work - but my concern is that we're not solving this holistically (for example, what about @azure/app-configuration which also supports conditional request headers?)

IMO if this is truly a unique case due to needing to maintain backwards compatibility in storage-blob's download method, (6) would make sense to me. If we want to solve this more holistically we'd want to investigate another option and I'd need some help from storage-blob's domain experts. @xirzec thoughts?

@Apokalypt
Copy link
Author

@maorleger we do not have the same problem with @azure/app-configuration. From what I see in the SDK code they are catching the exception to handle the case of 304 ( code ). Since the SDK catches the error and does not delegate error handling to the try/catch of the withSpan method, the exception is not generated.

Here is a screenshots of all spans created by Azure SDK (the last one is created by http-instrumentation) while trying to get a key in the App Configuration with the SDK @azure/app-configuration :
Image
And, here is our code where you can see the conditional header :
Image

@maorleger
Copy link
Member

maorleger commented Jan 17, 2025

Got it, thanks @Apokalypt !

In that case, what we can do in download is something along the lines of:

    const { span, updatedOptions } = tracingClient.startSpan("BlobClient-download", options);

    return tracingClient.withContext(updatedOptions.tracingOptions.tracingContext, async () => {
      try {
        const result = undefined as any; // TODO: to be replaced with the current `download` implementation 
        span.setStatus({ status: "success" });
        return result;
      } catch (err: any) {
        if (isActualError(err)) { // TODO: to be implemented
          span.setStatus({
            status: "error",
            error: err,
          });
        }
        throw err;
      } finally {
        span.end();
      }
    })

This allows some customization and control over the span's status. isActualError is just pseudocode here and can be replaced with whatever logic makes sense for this scenario.

@xirzec curious if this resonates with you or if I missed something obvious? Any other places where 304's bubble up for backwards compat or other reasons? If we have a wider need for this customization we can consider adding an onError callback to withSpan for library authors (in this case owners of storage-blob) to use but first I'd like to understand whether there's a wide need for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

4 participants