Skip to content
This repository has been archived by the owner on Mar 12, 2019. It is now read-only.

Restore collecting all errors from all supervisors #395

Open
jhakala opened this issue Apr 3, 2018 · 1 comment
Open

Restore collecting all errors from all supervisors #395

jhakala opened this issue Apr 3, 2018 · 1 comment

Comments

@jhakala
Copy link
Member

jhakala commented Apr 3, 2018

Martin tested causing errors that would appear in multiple supervisors and reported:

well no. LV1 listened to the 1st notification and decided to goToError.
Thus stops listening to the LV2's notifications.
I think we have been doing this since we use notification system

Today we had an issue that affected all of the HCAL partitions, but the error that appeared looked like the problem was localized to HO. We should restore the old behavior [before using the notification system to propagate supervisor errors, the SUPERVISOR_ERROR parameter and the error message sent to the level0 collected all the errors from all the supervisors.]

@kakwok
Copy link
Collaborator

kakwok commented Apr 11, 2018

The behaviour has changed since we stopped polling LV2's health from LV1.
#106
LV1 could probably still collect the error messages from LV2 when LV1 is in error.
I am not so sure for LV2 though.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants