Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix "Got packets our of order" errors on connect
When trying to upgrade to Trilogy 2.8.0, my application started raising errors when establishing connections to our database: ``` 1156: Got packets out of order ``` Bisecting the changes between 2.7.0 and 2.8.0 indicated that [the change][1] to remove sequence number hardcoding introduced the bug. Unfortunately, the issue was not easily reproducible locally, so I [added][1] a `printf` statement to try to understand what sequence number was being used instead of the previously hardcoded 1 (or 2). This log showed that in failure cases the sequence number was 0. After that, [adding more][3] `printf`s helped provide a clearer picture of the sequence of events when the incorrect sequence number is used: ``` trilogy=init seq=0 trilogy-pre-try_connect seq=0 trilogy-connect_recv seq_was=0 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=0 trilogy-connect_recv seq_was=0 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=1 trilogy-connect_recv-seq_was=1 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=0 trilogy-connect_recv-parse_handshake seq=0 trilogy=auth_send use_ssl=0, seq=0 ``` My understanding of what happens here is that `trilogy_connect_recv` is getting called 3 times in the `while` loop inside `try_connect`. The first time, it sets the `packet_parser`'s `sequence_number` to 0 before it calls `read_packet` but only reads some amount up to the packet's `sequence_number` before exiting with `TRILOGY_AGAIN`. `trilogy_connect_recv` gets called a second time, and again it sets the `packet_parser`'s `sequence_number` to 0. This time `read_packet` consumes the packet's `sequence_number` and so `packet_parser`'s `sequence_number` get incremented. However, this iteration also exits with `TRILOGY_AGAIN` before the packet is finished. `trilogy_connect_recv` is then called a third time, and again it sets the `packet_parser`'s `sequence_number` to 0. `read_packet` finally finishes consuming the packet and this iteration is finally able to parse the handshake from the completed packet. Since the `sequence_number` is being reset on each loop, `auth_send` ends up incorrectly using a sequence number of 0 instead of 1 (or 2). Therefore it seems incorrect for each call to `trilogy_connect_recv` to reset the `sequence_number` since it is not guaranteed that the last call will be the one that consumes the handshake's `sequence_numer` and increments the value for use in `auth_send`. [1]: 922544c [2]: skipkayhil/trilogy@f5afac3 [3]: skipkayhil/trilogy@e052939
- Loading branch information