Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add methods Temporary and Recoverable to amqp.Error #265

Merged
merged 3 commits into from
Jun 6, 2024

Conversation

peczenyj
Copy link
Contributor

@peczenyj peczenyj commented Jun 2, 2024

Today if I need to check if the error from amqp library is retriable or not, I need to try convert from error to *amqp.Error and check fields such as Code or Recover via errors.As function (because the error can be wrapped).

However, I may have other kinds of error such context deadline, etc,

A better way is to follow some common interface like Timeout() bool or Temporary() bool

I choose the second one for amqp.Error -- I hope you will like it

_, err := someOperation(ctx, ...)
if isTemporary(err) { retry... }
else if err != nil { ops... }

func isTemporary(err error) bool {
  // matches context deadline
  // and *amqp.Error, etc
  var terr interface {
    Temporary() bool 
  }
  
  return errors.As(err, &terr) && terr.Temporary()
}

Temporary interface returns true for two specific error codes:

  • amqp.ContentTooLarge:

The client attempted to transfer content larger than the server could accept at the present time. The client may retry at a later time.

  • amqp.ConnectionForced:

An operator intervened to close the connection for some reason. The client may retry at some later date.

I choose this two error codes based on this disclaimer https://github.com/google/go-cloud/blob/e0e5901b220191edf32c42cabd87f799858830a0/pubsub/rabbitpubsub/rabbit.go#L497

amqp.Error has a Recover field which sounds like it should mean "retryable".
But it actually means "can be recovered by retrying later or with different
parameters," which is not what we want. The error codes for which Recover is
true, defined in the isSoftExceptionCode function of
https://github.com/rabbitmq/amqp091-go/blob/main/spec091.go, including things
like NotFound and AccessRefused, which require outside action.

The following are the codes which might be resolved by retry without external
action, according to the AMQP 0.91 spec
(https://www.rabbitmq.com/amqp-0-9-1-reference.html#constants). The quotations
are from that page.

Since this definition is very restrictive, I add a more simple method: Recoverable() bool thar returns the content of the field Recover

All temporary errors are recoverable, but not all recoverable errors are temporary. If an error code requires outside action it is not temporary in the sense of redoing the same action will end in the same error code, until something else is done.

If such definitions does not help, lets discuss a better solution.

BTW I just add a GoString method to return all field values when used the %#v option

Regards

@peczenyj peczenyj marked this pull request as draft June 3, 2024 01:02
@peczenyj peczenyj marked this pull request as ready for review June 3, 2024 06:20
@peczenyj peczenyj changed the title add method Temporary to amqp.Error add methods Temporary and Recoverable to amqp.Error Jun 3, 2024
@Zerpet
Copy link
Contributor

Zerpet commented Jun 3, 2024

Hey, thank you for taking the time to contribute this PR. I like the proposal and the code looks good. There's a test failure that seems related to the change on this PR, the test breaking is TestConcurrentClose, this is the failure snippet:

error.log
=== RUN   TestConcurrentClose
    connection_test.go:165: first concurrent close was successful
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:165: first concurrent close was successful
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp [127](https://github.com/rabbitmq/amqp091-go/actions/runs/9345587094/job/25722709989?pr=265#step:5:128).0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:50852->127.0.0.1:5672: use of closed network connection
    connection_test.go:195: Expected no error, or ErrClosed, or a net.OpError from conn.Close(), got Exception=503, Reason="unexpected command received", Recover=false, Server=false (Exception (503) Reason: "unexpected command received") of type *amqp091.Error
--- FAIL: TestConcurrentClose (0.00s)

I'm happy to merge after all tests become green. IIRC this is an integration test, you will need a running rabbit to run the suite.

@Zerpet Zerpet self-assigned this Jun 3, 2024
@peczenyj
Copy link
Contributor Author

peczenyj commented Jun 3, 2024

hello @Zerpet

I start a local rabbitmq using docker and I run only this test on both main branch and my branch add-method... and I got similar results:

# starting image using the Makefile
$ make rabbitmq-server
docker run --detach --rm --name amqp091-go-rabbitmq \
	--publish 5672:5672 --publish 15672:15672 \
	--pull always rabbitmq:3-management
3-management: Pulling from library/rabbitmq
Digest: sha256:eee9afbc17c32424ba6309dfd2d9efc9b9b1863ffe231b3d2be2815758b0d649
Status: Image is up to date for rabbitmq:3-management
10281bc51862d2cf6decc05dd53378342ca5603dd9121e11832c4edd94d7d95f

# check if container is running
$ docker ps
CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS         PORTS                                                                                                                                                 NAMES
10281bc51862   rabbitmq:3-management   "docker-entrypoint.s…"   4 seconds ago   Up 3 seconds   4369/tcp, 5671/tcp, 0.0.0.0:5672->5672/tcp, :::5672->5672/tcp, 15671/tcp, 15691-15692/tcp, 25672/tcp, 0.0.0.0:15672->15672/tcp, :::15672->15672/tcp   amqp091-go-rabbitmq

# run this test:
$ go test -v -tags=integration -timeout 30s -run ^TestConcurrentClose$ github.com/rabbitmq/amqp091-go
environment variable envAMQPURLName undefined or empty, using default: "amqp://guest:guest@127.0.0.1:5672/"
=== RUN   TestConcurrentClose
    connection_test.go:165: first concurrent close was successful
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:165: first concurrent close was successful
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:165: first concurrent close was successful
    connection_test.go:165: first concurrent close was successful
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:165: first concurrent close was successful
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:165: first concurrent close was successful
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:170: later concurrent close were successful and returned ErrClosed
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
    connection_test.go:179: unknown net.OpError during close, ignoring: write tcp 127.0.0.1:52346->127.0.0.1:5672: use of closed network connection
--- PASS: TestConcurrentClose (0.01s)
PASS
ok  	github.com/rabbitmq/amqp091-go	0.009s

I don't think my patch had any impact on it

@peczenyj
Copy link
Contributor Author

peczenyj commented Jun 3, 2024

I add some small changes in other aspects of the code and tests (fix lint issues, etc)

If needed I can cherry pick and submit another PR

@Zerpet
Copy link
Contributor

Zerpet commented Jun 4, 2024

I had another look at the error, and I agree, your patch doesn't seem to be the root cause. In fact, it seems specific to GitHub actions, and the fact that it runs on low CPU specs, and IIRC no more than 2-cores.

I add some small changes in other aspects of the code and tests (fix lint issues, etc)

If needed I can cherry pick and submit another PR

Thank you for your additional work on this client, much appreciated! I'll take the offer and kindly ask you to open a different PR with the code formatting and linter fixes. I'll be happy to merge both PRs when ready.

@peczenyj
Copy link
Contributor Author

peczenyj commented Jun 4, 2024

Sure, I will prepare a second PR right now

@peczenyj peczenyj force-pushed the add-method-temporary-on-amqp-error branch from f8f6669 to 3e232fa Compare June 4, 2024 13:53
@peczenyj
Copy link
Contributor Author

peczenyj commented Jun 4, 2024

@Zerpet done

@peczenyj
Copy link
Contributor Author

peczenyj commented Jun 4, 2024

and the other commits I move to #266

@Zerpet
Copy link
Contributor

Zerpet commented Jun 6, 2024

Waiting for CI to pass before merging

@Zerpet Zerpet merged commit 2d7a4f4 into rabbitmq:main Jun 6, 2024
12 of 13 checks passed
@Zerpet Zerpet added this to the 1.11.0 milestone Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants