Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(grpc-web): response contains two trailer chunks #11988

Merged
merged 6 commits into from
Feb 25, 2025

Conversation

bzp2010
Copy link
Contributor

@bzp2010 bzp2010 commented Feb 23, 2025

Description

This PR is intended to address the following issues:

The reason for this is easy to explain, the old code used status to assert whether the response body had finished.
But this is unreliable, and it can go wrong in the following cases:

Since the response is chunked, the trailer chunk from upstream may not be fully included in the first response block, so the upstream_trailer_grpc_message variable is not available in the first body_filter, which causes APISIX to trigger the body_filter call a second time and incorrectly add a duplicate trailer block.

RESP HEADER: balabala

--- first chunk, eof = false 
RESP BODY:
balabala
grpc-status: 0

--- second chunk, eof = true
RESP BODY:
grpc-message: "demo"

--- done

Based on the pseudo-code above, you'll see that when this happens, the upstream_trailer_grpc_status value is already available to us at the first body_filter call, which causes the plugin code to send a trailer chunk.
When the next upstream chunk is received, body_filter is called a second time, and upstream_trailer_grpc_status still has the value it had before, so the plugin code sends the trailer block again.

This results in multiple duplicate trailers and accidentally breaks the grpc-web js client in the browser.

This PR changes the end of the response asserted using the status value to use the ngx.arg[2] value, which will be the most reliable and trigger the code to send the trailer only on the last chunk.

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

The PR does have some compatibility changes, specifically:

  • When request HTTP method errors, it returns 405 Method not allowed instead of 400 Bad Request, which will be more semantic and have no effect on any official client.
  • When there are any request errors (e.g. wrong Content-Type, request body reading or decoding failure), it returns the HTML error page without base64 encoding, which will visually return the error.
    • Originally, this returned base64(html), and since base64 causes the body length to change, part of the response body will be cut off.

This has no effect on users who are already using the plugin correctly, so while it may introduce some behavioral changes, it is not a BREAK CHANGE.

In addition to that, I've noticed some behavior that doesn't quite fit the protocol reference implementation; as a grpc-web proxy, the HTTP trailer chunk should be removed and instead be represented as an encoded response body chunk, which APISIX fails to handle correctly. However, since browsers don't care about trailer chunks, this doesn't seem to cause an error, and I'll see how I can work around it and possibly fix it later.

@bzp2010 bzp2010 marked this pull request as ready for review February 24, 2025 15:14
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Feb 24, 2025
moonming
moonming previously approved these changes Feb 25, 2025
membphis
membphis previously approved these changes Feb 25, 2025
nic-6443
nic-6443 previously approved these changes Feb 25, 2025
@bzp2010 bzp2010 dismissed stale reviews from nic-6443, membphis, and moonming via 08b163b February 25, 2025 02:07
Copy link
Member

@juzhiyuan juzhiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the test cases, LGTM

@juzhiyuan juzhiyuan merged commit 2a5425f into apache:master Feb 25, 2025
31 checks passed
end

if encoding == CONTENT_ENCODING_BASE64 then
body = decode_base64(body)
ngx.log(ngx.WARN, "DECODE BODY: ", body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why do we need warn log here?
  2. we can use core.log.warn instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obviously here leaves out some unneeded code, which I will remove in another PR. 🫡

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably code for testing and debugging.

-- n bytes: trailer
trailer_buf = trailer_buf .. grpc_web_trailer

return trailer_buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return trailer_buf
return trailer_buf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good first issue, hopefully someone from the community can fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants