From cb788be650d2f85502f5728c5fb3b287eb1f9313 Mon Sep 17 00:00:00 2001 From: Hartley McGuire Date: Thu, 2 May 2024 14:37:17 -0400 Subject: [PATCH] Fix "Got packets our of order" errors on connect When trying to upgrade to Trilogy 2.8.0, my application started raising errors when establishing connections to our database: ``` 1156: Got packets out of order ``` Bisecting the changes between 2.7.0 and 2.8.0 indicated that [the change][1] to remove sequence number hardcoding introduced the bug. Unfortunately, the issue was not easily reproducible locally, so I [added][1] a `printf` statement to try to understand what sequence number was being used instead of the previously hardcoded 1 (or 2). This log showed that in failure cases the sequence number was 0. After that, [adding more][3] `printf`s helped provide a clearer picture of the sequence of events when the incorrect sequence number is used: ``` trilogy=init seq=0 trilogy-pre-try_connect seq=0 trilogy-connect_recv seq_was=0 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=0 trilogy-connect_recv seq_was=0 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=1 trilogy-connect_recv-seq_was=1 trilogy-connect_recv seq=0 trilogy-connect_recv-read_packet seq=0 trilogy-connect_recv-parse_handshake seq=0 trilogy=auth_send use_ssl=0, seq=0 ``` My understanding of what happens here is that `trilogy_connect_recv` is getting called 3 times in the `while` loop inside `try_connect`. The first time, it sets the `packet_parser`'s `sequence_number` to 0 before it calls `read_packet` but only reads some amount up to the packet's `sequence_number` before exiting with `TRILOGY_AGAIN`. `trilogy_connect_recv` gets called a second time, and again it sets the `packet_parser`'s `sequence_number` to 0. This time `read_packet` consumes the packet's `sequence_number` and so `packet_parser`'s `sequence_number` get incremented. However, this iteration also exits with `TRILOGY_AGAIN` before the packet is finished. `trilogy_connect_recv` is then called a third time, and again it sets the `packet_parser`'s `sequence_number` to 0. `read_packet` finally finishes consuming the packet and this iteration is finally able to parse the handshake from the completed packet. Since the `sequence_number` is being reset on each loop, `auth_send` ends up incorrectly using a sequence number of 0 instead of 1 (or 2). Therefore it seems incorrect for each call to `trilogy_connect_recv` to reset the `sequence_number` since it is not guaranteed that the last call will be the one that consumes the handshake's `sequence_numer` and increments the value for use in `auth_send`. [1]: trilogy-libraries/trilogy@922544cee89b7651039c92aadda13bfceb31f790 [2]: skipkayhil/trilogy@f5afac3f5f96f4ac6c268f6ad72fc5430079b694 [3]: skipkayhil/trilogy@e05293970ed91d4e9cbf19da380fe4fb357aa0f4 --- src/client.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/src/client.c b/src/client.c index 53680620..0641022d 100644 --- a/src/client.c +++ b/src/client.c @@ -310,9 +310,6 @@ int trilogy_connect_send_socket(trilogy_conn_t *conn, trilogy_sock_t *sock) int trilogy_connect_recv(trilogy_conn_t *conn, trilogy_handshake_t *handshake_out) { - // reset the sequence number with each connect recv attempt - conn->packet_parser.sequence_number = 0; - int rc = read_packet(conn); if (rc < 0) {