Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/regression: DELETE /relay/v1/subscriptions freezes in certain conditions #1034

Open
fbarbu15 opened this issue Feb 15, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@fbarbu15
Copy link

Describe the bug
This seems as a regression as it started to happen on 2/9/2024.
It also happens only in certain conditions as described bellow

To Reproduce

  1. Have 2 nodes connected via relay protocol
  2. Subscribe both nodes to 5 topics. ex: POST /relay/v1/subscriptions with payload: ["/waku/2/rs/0/0", "/waku/2/rs/0/1", "/waku/2/rs/0/9", "/waku/2/rs/0/25", "/waku/2/rs/0/1000"]
  3. (optional) publish messages and check that the message relay works
  4. Delete all 5 subscriptions from the 1st node ex: DELETE /relay/v1/subscriptions with payload: ["/waku/2/rs/0/0", "/waku/2/rs/0/1", "/waku/2/rs/0/9", "/waku/2/rs/0/25", "/waku/2/rs/0/1000"]
  5. Try to delete the subscriptions from the 2nd node

Actual behavior
Request freezes and is only stops if the client times out

The issue only reproduces if node1 unsubscribes first. If we reverse the order it works fine
The issue only reproduces if node2 tries to unsubscribes from 4 or more topics. It work is it's bellow 4/

go-waku version/commit hash

harbor.status.im/wakuorg/go-waku:latest

node1.log
node2.log

node1 args ['--listen-address=0.0.0.0', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=53777', '--rest-port=53775', '--tcp-port=53776', '--discv5-udp-port=53778', '--rest-address=0.0.0.0', '--nat=extip:172.18.230.234', '--peer-exchange=true', '--discv5-discovery=true', '--cluster-id=0', '--min-relay-peers-to-publish=1', '--legacy-filter=false', '--rest-filter-cache-capacity=50', '--relay=true', '--nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68']

node2 args ['--listen-address=0.0.0.0', '--rest=true', '--rest-admin=true', '--websocket-support=true', '--log-level=DEBUG', '--rest-relay-cache-capacity=100', '--websocket-port=5557', '--rest-port=5555', '--tcp-port=5556', '--discv5-udp-port=5558', '--rest-address=0.0.0.0', '--nat=extip:172.18.188.172', '--peer-exchange=true', '--discv5-discovery=true', '--cluster-id=0', '--min-relay-peers-to-publish=1', '--legacy-filter=false', '--rest-filter-cache-capacity=50', '--relay=true', '--discv5-bootstrap-node=enr:-Ly4QOJ7KFVvRp5EttMNtjmNbGtF-hYIG8l6RSTgj7Jb8K1HZUgHCA234U-gSw2TGgx3cYaRCRcTH2DUzW2O7bhtuv6GAY2r1df1gmlkgnY0gmlwhKwRAAKKbXVsdGlhZGRyc5gACgR_AAABBtIR3QMACgSsEQACBtIR3QOJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CC0hCDdWRwgtIShXdha3UyAQ']

@chaitanyaprem
Copy link
Collaborator

You are using cluster 0 which means the node will be subscribed to default pubsubTopic which doesn't follow sharding format.
Later you are subscribing the node to sharded topics. This is not recommended usage.
Can you rerun this test with cluster-ID other than 0 and confirm the behaviour.

@fbarbu15
Copy link
Author

I tried to do this but something is wrong, when I use a different cluster ID, go-waku doesn't receive the messages published from nwaku, not sure what am I doing wrong.
If I use nwaku -> nwaku everything is fine with any cluster ID (0 or >0)
So the problem is with go-waku or I'm missing something when I'm starting the nodes

I've created a raw script that reproduces this issue, can you please have a look? Thanks
gowaku_fails_to_fetch_messages.zip

@fbarbu15
Copy link
Author

I found why I had the issue from my last comment on clusters different than 0.
Seems for other clusters I need to pass the --pubsub-topic as a docker flag for this to work, it's not enough to do a relay/v1/subscriptions.
On nwaku side it works both ways to subscribe to a topic, via config(docker flag) or via API call.

@chaitanyaprem do you think this behavior is ok?

Also I can confirm that this bug with DELETE /relay/v1/subscriptions freeze disappears on other cluster IDs

@chaitanyaprem
Copy link
Collaborator

I found why I had the issue from my last comment on clusters different than 0. Seems for other clusters I need to pass the --pubsub-topic as a docker flag for this to work, it's not enough to do a relay/v1/subscriptions. On nwaku side it works both ways to subscribe to a topic, via config(docker flag) or via API call.

@chaitanyaprem do you think this behavior is ok?

Also I can confirm that this bug with DELETE /relay/v1/subscriptions freeze disappears on other cluster IDs

That is odd, subscribing to a pubsubTopic via REST API should also work. Can you share debug logs when you invoke this command, would like to see what could be the issue?

@fbarbu15
Copy link
Author

Logs:
logs_when_pubsub_topic_is_passed.zip
logs_when_pubusub_topic_is_not_passed.zip

Node start:

docker run -i -t -p 9503:9503 -p 9504:9504 -p 9505:9505 -p 9506:9506 -p 9507:9507 harbor.status.im/wakuorg/nwaku:latest --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=TRACE --rest-relay-cache-capacity=100 --websocket-port=9505 --rest-port=9503 --tcp-port=9504 --discv5-udp-port=9506 --rest-address=0.0.0.0 --nat=extip:172.18.239.110 --peer-exchange=true --discv5-discovery=true --cluster-id=2 --metrics-server=true --metrics-server-address=0.0.0.0 --metrics-server-port=9507 --metrics-logging=true --relay=true --nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68 
docker run -i -t -p 27921:27921 -p 27922:27922 -p 27923:27923 -p 27924:27924 -p 27925:27925 harbor.status.im/wakuorg/go-waku:latest --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=27923 --rest-port=27921 --tcp-port=27922 --discv5-udp-port=27924 --rest-address=0.0.0.0 --nat=extip:172.18.133.187 --peer-exchange=true --discv5-discovery=true --cluster-id=2 --min-relay-peers-to-publish=1 --rest-filter-cache-capacity=50 --relay=true --discv5-bootstrap-node=enr:-LO4QJVGXwyXhEyIRPgpgp8x_0-jI95hMl6U5HYM-QQbA1H9Acaov_FI3Y-KQ33tTaST4Tsr29Au2K1_bH7bkQQT8H0BgmlkgnY0gmlwhKwS726KbXVsdGlhZGRyc4wACgSsEu9uBiUh3QOCcnOFAAIBAACJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCJSCDdWRwgiUihXdha3UyAQ

docker network connect --ip 172.18.239.110 waku 770e6f73565eb80511dab2cc8c7c6e43abf03a8ee79245e5bf39901f2217dba6
docker network connect --ip 172.18.133.187 waku 45e51f2a4ecadec78cc8033d5528ac90d612fbbe6fc9098124083d4c4965e029

Subscribe requests:

curl -v -X POST "http://127.0.0.1:9503/relay/v1/subscriptions" -H "Content-Type: application/json" -d '["/waku/2/rs/2/0"]'
curl -v -X POST "http://127.0.0.1:27921/relay/v1/subscriptions" -H "Content-Type: application/json" -d '["/waku/2/rs/2/0"]'

Nwaku publish:

curl -v -X POST "http://127.0.0.1:9503/relay/v1/messages/%2Fwaku%2F2%2Frs%2F2%2F0" -H "Content-Type: application/json" -d '{"payload": "UmVsYXkgd29ya3MhIQ==", "contentTopic": "/test/1/waku-relay/proto", "timestamp": '$(date +%s%N)'}'

Go-waku get message:

curl -v -X GET "http://127.0.0.1:27921/relay/v1/messages/%2Fwaku%2F2%2Frs%2F2%2F0" -H "Content-Type: application/json"

@chaitanyaprem chaitanyaprem moved this to In Progress in Waku Mar 25, 2024
@chaitanyaprem
Copy link
Collaborator

I found why I had the issue from my last comment on clusters different than 0. Seems for other clusters I need to pass the --pubsub-topic as a docker flag for this to work, it's not enough to do a relay/v1/subscriptions. On nwaku side it works both ways to subscribe to a topic, via config(docker flag) or via API call.

Went through the logs and in all cases i see 200 OK returned for POST subscription.
So, i am not able to understand where the issue is .
Can you elaborate more on how you think subscription is failing when pubsub topic is not passed in the arguments and only REST API is used?

@fbarbu15
Copy link
Author

Sure, here's the steps:

  • start NODE1 and NODE2 on cluster id 2 without any --pubsub-topic flag and connect them via discv5
  • subscribe both nodes via API to /waku/2/rs/2/0
  • wait for some time to make sure the node connect via discv5
  • send/relay one message via API on NODE1
  • check via API that the above message reaches NODE2

Issue happens only if NODE1 is nwaku and NODE2 is go-waku
Issue doesn't happen if:

  • we pass --pubsub-topic=/waku/2/rs/2/0 to the docker command
  • NODE2 is also nwaku

Here's a shell script that you can run directly to reproduce the issue:

#!/bin/bash
printf "\nAssuming you already have a docker network called waku\n"
# if not something like this should create it: docker network create --driver bridge --subnet 172.18.0.0/16 --gateway 172.18.0.1 waku

cluster_id=2
pubsub_topic="/waku/2/rs/$cluster_id/0"
encoded_pubsub_topic=$(echo "$pubsub_topic" | sed 's:/:%2F:g')
node_1=harbor.status.im/wakuorg/nwaku:latest
node_2=harbor.status.im/wakuorg/go-waku:latest

printf "\nStarting containers\n"

container_id1=$(docker run -d -i -t -p 12297:12297 -p 12298:12298 -p 12299:12299 -p 12300:12300 -p 12301:12301 $node_1 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=12299 --rest-port=12297 --tcp-port=12298 --discv5-udp-port=12300 --rest-address=0.0.0.0 --nat=extip:172.18.45.95 --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --relay=true --nodekey=30348dd51465150e04a5d9d932c72864c8967f806cce60b5d26afeca1e77eb68)
docker network connect --ip 172.18.45.95 waku $container_id1

printf "\nSleeping 2 seconds\n"
sleep 2

response=$(curl -X GET "http://127.0.0.1:12297/debug/v1/info" -H "accept: application/json")
enrUri=$(echo $response | jq -r '.enrUri')

container_id2=$(docker run -d -i -t -p 8158:8158 -p 8159:8159 -p 8160:8160 -p 8161:8161 -p 8162:8162 $node_2 --listen-address=0.0.0.0 --rest=true --rest-admin=true --websocket-support=true --log-level=DEBUG --rest-relay-cache-capacity=100 --websocket-port=8160 --rest-port=8158 --tcp-port=8159 --discv5-udp-port=8161 --rest-address=0.0.0.0 --nat=extip:172.18.207.159 --peer-exchange=true --discv5-discovery=true --cluster-id=$cluster_id --relay=true --discv5-bootstrap-node=$enrUri)

docker network connect --ip 172.18.207.159 waku $container_id2

printf "\nSleeping 1 seconds\n"
sleep 1

printf "\nSubscribe\n"
curl -X POST "http://127.0.0.1:12297/relay/v1/subscriptions" -H "Content-Type: application/json" -d "[\"$pubsub_topic\"]"
curl -X POST "http://127.0.0.1:8158/relay/v1/subscriptions" -H "Content-Type: application/json" -d "[\"$pubsub_topic\"]"


printf "\nSleeping 70 seconds to be sure that the nodes connect via discv5\n"
sleep 70

printf "\nRelay from NODE 1\n"                            
curl -X POST "http://127.0.0.1:12297/relay/v1/messages/$encoded_pubsub_topic" -H "Content-Type: application/json" -d '{"payload": "UmVsYXkgd29ya3MhIQ==", "contentTopic": "/test/1/waku-relay/proto", "timestamp": '$(date +%s%N)'}'

printf "\nSleeping 1 seconds\n"
sleep 1

printf "\nCheck message in NODE 2\n"
response=$(curl -X GET "http://127.0.0.1:8158/relay/v1/messages/$encoded_pubsub_topic" -H "Content-Type: application/json")

printf "\nResponse: $response\n"

if [ "$response" == "no subscription found for pubsubTopic" ] || [ "$response" == "[]" ] || [ -z "$response" ] || [ "$response" == "null" ]; then
    printf "\nError: NODE 2 didn't find the message, the response is null, or the response contains the text 'null'\n"
    exit 1
else
    printf "\nSuccess: NODE 2 received the message\n"
fi

@fbarbu15
Copy link
Author

Update, I see nwaku was also changed to work as go-waku :)
So with the latest version of nwaku if we don't pass --pubsub-topic the peer cannot retrieve the published messages

@fbarbu15
Copy link
Author

Found that the bug (DELETE /relay/v1/subscriptions freezes in certain conditions) actually still reproduces on cluster id 2 but not always

@chaitanyaprem chaitanyaprem moved this from In Progress to To Do in Waku Apr 30, 2024
@chaitanyaprem chaitanyaprem removed their assignment Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Do
Development

No branches or pull requests

2 participants