You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The consistency check routine eventually recreates a flow when Kytos is under high usage (many requests in a short period of time), which indicates a possible race condition.
More specifically, the following end-to-end tests sometimes pass and sometimes fail:
_______________________________ TestE2EMefEline.test_070_repeat_creation_and_removal_of_ten_circuits_ten_times ________________________________
self = <tests.test_e2e_10_mef_eline.TestE2EMefEline instance at 0x7f757b5bd5f0>
def test_070_repeat_creation_and_removal_of_ten_circuits_ten_times(self):
for x in range(0, 10):
evcs = {}
for i in range(400, 410):
payload = {
"name": "evc_%s" % i,
"enabled": True,
"dynamic_backup_path": True,
"uni_a": {
"interface_id": "00:00:00:00:00:00:00:01:1",
"tag": {"tag_type": 1, "value": i}
},
"uni_z": {
"interface_id": "00:00:00:00:00:00:00:02:1",
"tag": {"tag_type": 1, "value": i}
}
}
api_url = KYTOS_API + '/mef_eline/v2/evc/'
response = requests.post(api_url, data=json.dumps(payload), headers={'Content-type': 'application/json'})
assert response.status_code == 201
data = response.json()
assert 'circuit_id' in data
evcs[i] = data['circuit_id']
time.sleep(10)
# make sure the evcs are active and the flows were created
s1, s2 = self.net.net.get('s1', 's2')
flows_s1 = s1.dpctl('dump-flows')
flows_s2 = s2.dpctl('dump-flows')
for vid in evcs:
evc_id = evcs[vid]
api_url = KYTOS_API + '/mef_eline/v2/evc/' + evc_id
response = requests.get(api_url)
assert response.status_code == 200
evc = response.json()
#should be active
assert evc["active"] is True
# search for the vlan id
assert "dl_vlan=%s" % vid in flows_s1
assert "dl_vlan=%s" % vid in flows_s2
# search for the cookie, should have two flows
assert len(re.findall(evc['id'], flows_s1, flags=re.IGNORECASE)) == 2
assert len(re.findall(evc['id'], flows_s2, flags=re.IGNORECASE)) == 2
# Delete the circuits
for vid in evcs:
evc_id = evcs[vid]
api_url = KYTOS_API + '/mef_eline/v2/evc/' + evc_id
response = requests.delete(api_url)
assert response.status_code == 200
time.sleep(10)
# make sure the circuits were deleted
api_url = KYTOS_API + '/mef_eline/v2/evc/'
response = requests.get(api_url)
assert response.status_code == 200
assert response.json() == {}
flows_s1 = s1.dpctl('dump-flows')
flows_s2 = s2.dpctl('dump-flows')
> assert len(flows_s1.split('\r\n ')) == 1, "round=%d - should have only 1 flow but had: \n%s" % (x, flows_s1)
E AssertionError: round=8 - should have only 1 flow but had:
E cookie=0xaa28bc57b4ba8f4a, duration=8.110s, table=0, n_packets=0, n_bytes=0, in_port="s1-eth1",dl_vlan=405 actions=strip_vlan,mod_vlan_vid:3461,output:"s1-eth3"
E cookie=0xaa28bc57b4ba8f4a, duration=8.079s, table=0, n_packets=0, n_bytes=0, in_port="s1-eth3",dl_vlan=3461 actions=strip_vlan,mod_vlan_vid:405,output:"s1-eth1"
E cookie=0x0, duration=220.814s, table=0, n_packets=144, n_bytes=6048, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535
E
E assert 3 == 1
E + where 3 = len([' cookie=0xaa28bc57b4ba8f4a, duration=8.110s, table=0, n_packets=0, n_bytes=0, in_port="s1-eth1",dl_vlan=405 actions=...20.814s, table=0, n_packets=144, n_bytes=6048, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'])
E + where [' cookie=0xaa28bc57b4ba8f4a, duration=8.110s, table=0, n_packets=0, n_bytes=0, in_port="s1-eth1",dl_vlan=405 actions=...20.814s, table=0, n_packets=144, n_bytes=6048, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'] = <built-in method split of str object at 0x564093f85fd0>('\r\n ')
E + where <built-in method split of str object at 0x564093f85fd0> = ' cookie=0xaa28bc57b4ba8f4a, duration=8.110s, table=0, n_packets=0, n_bytes=0, in_port="s1-eth1",dl_vlan=405 actions=s...220.814s, table=0, n_packets=144, n_bytes=6048, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'.split
As noted, the test basically creates 10 EVCs, waits a couple of seconds, makes sure the flows were created, removes the 10 evcs, waits a couple of seconds, makes sure the flows were removed, and so on. We repeat this procedure 10 times.
This test was run using the AmLight Kytos image (https://hub.docker.com/r/amlight/kytos), which is basically a master branch daily build image, with some PRs applied (e.g., #117 and more: https://github.com/amlight/kytos-docker/tree/master/patches). Also, we change the default STATS_INTERVAL setting to 3s and leave the CONSISTENCY_INTERVAL=0 (which means the consistency_check will run every 3s). The LINK_UP_TIMER is also changed to 1s.
This problem seems to be a race-condition situation because while the consistency check is running, other requests to remove EVCs are received for different switches and each thread seems to change the self.stored_flows in a uncontrolled manner (the shared resource).
I've enabled some logging and this is how it looks like (logs for the above test, cookie 0xaa28bc57b4ba8f4a == 12261257070396477258 ==> evc ID 28bc57b4ba8f4a):
In between those events, I could see that the function _store_changed_flows() sometimes overwrite a content previously changed. Attached the full log. syslog-2021-05-10.log.gz
The text was updated successfully, but these errors were encountered:
I've run this test 10 times and none of them finished with all 100 flows created: most of the times around 80 flows were created. In fact, the flows are created in the first place, however the storehouse is overwritten due to the race condition and some of them are lost from the storehouse. Thus, in the next run of the consistency check routine, the flows are removed.
Original issue opened by @italovalcy at kytos#128.
Hi folks,
The consistency check routine eventually recreates a flow when Kytos is under high usage (many requests in a short period of time), which indicates a possible race condition.
More specifically, the following end-to-end tests sometimes pass and sometimes fail:
As noted, the test basically creates 10 EVCs, waits a couple of seconds, makes sure the flows were created, removes the 10 evcs, waits a couple of seconds, makes sure the flows were removed, and so on. We repeat this procedure 10 times.
This test was run using the AmLight Kytos image (https://hub.docker.com/r/amlight/kytos), which is basically a master branch daily build image, with some PRs applied (e.g., #117 and more: https://github.com/amlight/kytos-docker/tree/master/patches). Also, we change the default STATS_INTERVAL setting to 3s and leave the CONSISTENCY_INTERVAL=0 (which means the consistency_check will run every 3s). The LINK_UP_TIMER is also changed to 1s.
This problem seems to be a race-condition situation because while the consistency check is running, other requests to remove EVCs are received for different switches and each thread seems to change the self.stored_flows in a uncontrolled manner (the shared resource).
I've enabled some logging and this is how it looks like (logs for the above test, cookie
0xaa28bc57b4ba8f4a == 12261257070396477258 ==> evc ID 28bc57b4ba8f4a
):In between those events, I could see that the function _store_changed_flows() sometimes overwrite a content previously changed. Attached the full log.
syslog-2021-05-10.log.gz
The text was updated successfully, but these errors were encountered: