Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PLATFORM-2274]: Investigate possible DNS caching issues on our rabbitmq libraries #199

Conversation

cottinisimone
Copy link
Contributor

@cottinisimone cottinisimone commented Sep 27, 2024

https://prima-assicurazioni-spa.myjetbrains.com/youtrack/issue/PLATFORM-2274

This PR modifies the connection behavior of amqpx when connecting to a RabbitMQ cluster or host:

  • If the host field in the connection parameters or the RabbitMQ URL is an IP address, the library will establish a direct connection to that IP.
  • If the host field in the connection parameters or the RabbitMQ URL is a DNS record, the library will resolve the associated A records and attempt to connect to the first available IP address.

@cottinisimone cottinisimone requested a review from a team as a code owner September 27, 2024 14:09
Copy link
Contributor

@neslinesli93 neslinesli93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! a few comments:

  1. does it work only for the first connection opening, or also for re-connections after errors?
  2. do you think it's possible to add a test that actually reproduces the issue where the first resolution fails but the second succeeds? you know, actually testing the reduce_while behavior
  3. can you add a short description in the PR please please :c

@cottinisimone
Copy link
Contributor Author

nice work! a few comments:

  1. does it work only for the first connection opening, or also for re-connections after errors?
  2. do you think it's possible to add a test that actually reproduces the issue where the first resolution fails but the second succeeds? you know, actually testing the reduce_while behavior
  3. can you add a short description in the PR please please :c

Ehi @neslinesli93

  1. It only handles first connection opening. I thought about reconnection but i feel like the reconnection logic/strategy should be implemented by the client itself. Wdyt?
  2. You are right. I tested it locally and it requires something like a local dns server. I'll do another commit adding
  3. Yes lol

@neslinesli93
Copy link
Contributor

nice work! a few comments:

  1. does it work only for the first connection opening, or also for re-connections after errors?
  2. do you think it's possible to add a test that actually reproduces the issue where the first resolution fails but the second succeeds? you know, actually testing the reduce_while behavior
  3. can you add a short description in the PR please please :c

Ehi @neslinesli93

  1. It only handles first connection opening. I thought about reconnection but i feel like the reconnection logic/strategy should be implemented by the client itself. Wdyt?

if the library handles the reconnection itself (i don't remember tbh) then the process should be the same... otherwise yeah the clients should handle that

cpiemontese
cpiemontese previously approved these changes Sep 30, 2024
Copy link
Contributor

@cpiemontese cpiemontese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Tests of actual resolution would be really cool, not sure if modifying the /etc/hosts file or something like that could work 🤷

andreausu
andreausu previously approved these changes Sep 30, 2024
@cottinisimone cottinisimone force-pushed the PLATFORM-2274/task/investigate-possible-dns-caching-issues-on-our-rabbitmq-libraries branch from 0fc438a to 06ede36 Compare October 1, 2024 13:00
Copy link
Contributor

@cpiemontese cpiemontese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocking for the win

Copy link
Contributor

@neslinesli93 neslinesli93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@cottinisimone cottinisimone merged commit 8325dbb into master Oct 1, 2024
3 checks passed
@cottinisimone cottinisimone deleted the PLATFORM-2274/task/investigate-possible-dns-caching-issues-on-our-rabbitmq-libraries branch October 1, 2024 14:34
@francesco995
Copy link
Member

@andreausu @cottinisimone i'm worried about

If the host field in the connection parameters or the RabbitMQ URL is a DNS record, the library will resolve the associated A records and attempt to connect to the first available IP address.

don't we risk to, in case of normal operations, connect always to the first node?

@cottinisimone
Copy link
Contributor Author

@andreausu @cottinisimone i'm worried about

If the host field in the connection parameters or the RabbitMQ URL is a DNS record, the library will resolve the associated A records and attempt to connect to the first available IP address.

don't we risk to, in case of normal operations, connect always to the first node?

It seems is not. The load balancer is balancing the ips during the resolution too.

Here's the code i run to test it:

1..100
|> Enum.map(fn _ ->
    'loud-azure-centipede.in.rmq4.cloudamqp.com'
    |> Amqpx.DNS.resolve_ips()
    |> Enum.map(&to_string/1)
    |> Enum.join("-")
end)
|> Enum.group_by(fn v -> v end)
|> Enum.map(fn {k, v} -> {k, Enum.count(v)} end)

@andreausu
Copy link

@andreausu @cottinisimone i'm worried about

If the host field in the connection parameters or the RabbitMQ URL is a DNS record, the library will resolve the associated A records and attempt to connect to the first available IP address.

don't we risk to, in case of normal operations, connect always to the first node?

I've always witnessed DNS servers returning answers for A records with multiple IPs associated in random order, but I'm not sure if that's by RFC or just a common implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants