Skip to content

Commit

Permalink
Fix "Got packets our of order" errors on connect
Browse files Browse the repository at this point in the history
When trying to upgrade to Trilogy 2.8.0, my application started raising
errors when establishing connections to our database:

```
1156: Got packets out of order
```

Bisecting the changes between 2.7.0 and 2.8.0 indicated that [the
change][1] to remove sequence number hardcoding introduced the bug.

Unfortunately, the issue was not easily reproducible locally, so I
[added][1] a `printf` statement to try to understand what sequence
number was being used instead of the previously hardcoded 1 (or 2). This
log showed that in failure cases the sequence number was 0.

After that, [adding more][3] `printf`s helped provide a clearer picture
of the sequence of events when the incorrect sequence number is used:

```
trilogy=init seq=0
trilogy-pre-try_connect seq=0
trilogy-connect_recv seq_was=0
trilogy-connect_recv seq=0
trilogy-connect_recv-read_packet seq=0
trilogy-connect_recv seq_was=0
trilogy-connect_recv seq=0
trilogy-connect_recv-read_packet seq=1
trilogy-connect_recv-seq_was=1
trilogy-connect_recv seq=0
trilogy-connect_recv-read_packet seq=0
trilogy-connect_recv-parse_handshake seq=0
trilogy=auth_send use_ssl=0, seq=0
```

My understanding of what happens here is that `trilogy_connect_recv` is
getting called 3 times in the `while` loop inside `try_connect`.

The first time, it sets the `packet_parser`'s `sequence_number` to 0
before it calls `read_packet` but only reads some amount up to the
packet's `sequence_number` before exiting with `TRILOGY_AGAIN`.

`trilogy_connect_recv` gets called a second time, and again it sets the
`packet_parser`'s `sequence_number` to 0. This time `read_packet`
consumes the packet's `sequence_number` and so `packet_parser`'s
`sequence_number` get incremented. However, this iteration also exits
with `TRILOGY_AGAIN` before the packet is finished.

`trilogy_connect_recv` is then called a third time, and again it sets
the `packet_parser`'s `sequence_number` to 0. `read_packet` finally
finishes consuming the packet and this iteration is finally able to
parse the handshake from the completed packet.

Since the `sequence_number` is being reset on each loop, `auth_send`
ends up incorrectly using a sequence number of 0 instead of 1 (or 2).
Therefore it seems incorrect for each call to `trilogy_connect_recv` to
reset the `sequence_number` since it is not guaranteed that the last
call will be the one that consumes the handshake's `sequence_numer` and
increments the value for use in `auth_send`.

[1]: 922544c
[2]: skipkayhil/trilogy@f5afac3
[3]: skipkayhil/trilogy@e052939
  • Loading branch information
skipkayhil committed May 2, 2024
1 parent 1b5c43b commit cb788be
Showing 1 changed file with 0 additions and 3 deletions.
3 changes: 0 additions & 3 deletions src/client.c
Original file line number Diff line number Diff line change
Expand Up @@ -310,9 +310,6 @@ int trilogy_connect_send_socket(trilogy_conn_t *conn, trilogy_sock_t *sock)

int trilogy_connect_recv(trilogy_conn_t *conn, trilogy_handshake_t *handshake_out)
{
// reset the sequence number with each connect recv attempt
conn->packet_parser.sequence_number = 0;

int rc = read_packet(conn);

if (rc < 0) {
Expand Down

0 comments on commit cb788be

Please sign in to comment.