Skip to content

Commit

Permalink
Do not run control plane testing while openstackclient is updated.
Browse files Browse the repository at this point in the history
To test the control plane during update we leverage the
openstackclient pod.

While the control plane is updated, the openstackclient pod is updated
as well. That means that during a short period of time (less than 2
seconds in the tests we have run) the openstackclient might be
unavailable.

To prevent false positive we:
1. make sure the error was not linked to the openstackclient shutting
down (code 137)
2. or, verify if the error message match the expected one for such
error.

We prevent the test to always succeed by activating pipefail when
triggering the command.

Closes: https://issues.redhat.com/browse/OSPRH-10546
  • Loading branch information
sathlan authored and openshift-merge-bot[bot] committed Oct 23, 2024
1 parent a1d6c0c commit c6b994f
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 3 deletions.
2 changes: 2 additions & 0 deletions roles/update/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,6 @@ Role to run update
* `cifmw_update_ping_loss_second` : (Integer) Number of seconds that the ping test is allowed to fail. Default to `0`. Note that 1 packet loss is always accepted to avoid false positive.
* `cifmw_update_ping_loss_percent` : (Integer) Maximum percentage of ping loss accepted. Default to `0`. Only relevant when `cifmw_update_ping_loss_second` is not 0.
* `cifmw_update_control_plane_check`: (Boolean) Activate a continuous control plane testing. Default to `False`
* `cifmw_update_openstackclient_pod_timeout`: (Integer) Maximum number of seconds to wait for the openstackclient Pod to be available during control plane testing, as it is being restarted during update. Default to `10` seconds.

## Examples
2 changes: 2 additions & 0 deletions roles/update/defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,6 @@ cifmw_update_create_volume: false
cifmw_update_ping_loss_second: 0
cifmw_update_ping_loss_percent: 0

# Control plane Testing
cifmw_update_control_plane_check: false
cifmw_update_openstackclient_pod_timeout: 10 # in seconds.
51 changes: 48 additions & 3 deletions roles/update/templates/workload_launch_k8s.sh.j2
Original file line number Diff line number Diff line change
@@ -1,8 +1,53 @@
#!/usr/bin/bash

set +x

export KUBECONFIG="{{ cifmw_openshift_kubeconfig }}"
export PATH="{{ cifmw_path }}"

cat "{{ cifmw_update_artifacts_basedir }}/workload_launch.sh" | \
oc rsh -n openstack openstackclient env WKL_MODE=sanityfast bash
OS_POD_TIMEOUT={{ cifmw_update_openstackclient_pod_timeout }}
WAIT=0

# Temporary file where to put the error message, if any.
ERROR_FILE=/tmp/cifmw_update_ctl_testing_current_ouput.txt
rm -f "${ERROR_FILE}"

while [ $((WAIT++)) -lt ${OS_POD_TIMEOUT} ]; do
set -o pipefail # Make sure we get the failure, as tee
# will always succeed.
cat "{{ cifmw_update_artifacts_basedir }}/workload_launch.sh" | \
oc rsh -n openstack openstackclient env WKL_MODE=sanityfast bash 2>&1 | tee "${ERROR_FILE}"
RC=$?
set +o pipefail
if [ "${RC}" -eq 137 ]; then
# When the command is interrupted by the restart of the
# OSclient, we have this returns code. We just retry.
sleep 1
continue
fi
# If there's an error and the error file was created we check for
# the error message.
if [ "${RC}" -ne 0 ]; then
if [ ! -e "${ERROR_FILE}" ]; then
# no error file, rethrow the error.
exit $RC
fi
# Fragile as it depends on the exact output message.
if grep -F 'error: unable to upgrade connection: container not found' \
"${ERROR_FILE}"; then
# Openstackclient was not able to start as it's being
# restarted, retry.
sleep 1
continue
fi
# Error is not related to the the openstackclient not being
# available. We rethrow it.
exit ${RC}
fi
# No error.
exit 0
done

# We only reach this code if we reach timeout while retrying to
# trigger the openstackclient.
echo "OpenstackClient Pod unavalaible, giving up after ${OS_POD_TIMEOUT} seconds" >&2
exit 127

0 comments on commit c6b994f

Please sign in to comment.