-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix container logs #68
Fix container logs #68
Conversation
In this patch we replace the static namespaces `openstack` and `openstack-operators` with environmental variables `OSP_NS` and `OSP_OPERATORS_NS` that have those values as default. This is necessary because deployments could use a different namespace and then we wouldn't gather the right information.
For the time being go back to getting the pod logs using the OpenShift API. This is a partial revert of commit 4baaa87. This resolves a number of existing limitations: - Not following the namespaces/<NS>/pods/<podname>/logs/ structure - Not having logs directly accessible on the CI jobs web server - Not having logs if SOS report gathering fails - Not having operator logs if the OSP services have not been deployed yet. But it introduces duplication of gathered logs and it's less efficient.
To remove the duplicated logs, using `oc logs` and via sos reports, this patch changes how sos reports gather logs and limits them to only rotated logs. We are forced to ignore tar errors because a file can be removed while processing it due to the log rotation mechanism. The code only adds the logs if there are any rotated logs available.
Our current `gather_run` script doesn't wait until all background tasks have completed when the database is not being gathered. This is caused by the `gather_db` script using `exit 0` instead of `return` when sourced. This patch makes that small change and now the code will wait for all background operations as intended. This can be problematic in some cases because we can find the must-gather tar file missing some of the data that we want collected.
We currently have 2 locations that have container logs: - `must-gather/namespaces/*/pods/*/logs/` - `must-gather/sos-reports/_all_nodes/*/var/log/pods/*/...` This means that we could inadvertently miss some of the files if we don't know of the two locations. To alleviate this situation this patch builds symbolic links to the rotated log files in the `namespaces` logs location. Because there can be 2 log files with the same name on 2 different nodes, the code renames them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -2,10 +2,19 @@ | |||
|
|||
source "${DIR_NAME}/bg.sh" | |||
|
|||
export OSP_NS="${OSP_NS-openstack}" | |||
export OSP_OPERATORS_NS="${OSP_OPERATORS_NS-openstack-operators}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this is a good idea. Originally DEFAULT_NAMESPACES
was a collection of default
values that might or might not be present in the cluster, and this array was supposed to be grown by passing ADDITIONAL_NAMESPACES
, but this improves that logic, and I like it.
# describe pod | ||
run_bg oc -n "$NS" describe pod "$pod" '>' "${pods_dir}/${pod}-describe" | ||
# We make a single request to get lines in the form <pod> <container> <crash_status> | ||
data=$(oc -n "$NS" get pod -o go-template='{{range $indexp,$pod := .items}}{{range $index,$element := $pod.status.containerStatuses}}{{printf "%s %s" $pod.metadata.name $element.name}} {{ if ne $element.lastState.terminated nil }}{{ printf "%s" $element.lastState.terminated }}{{ end }}{{ printf "\n"}}{{end}}{{end}}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 here, I re-tested locally (just because CI is weird, it returns "forbidden" and we can't actually inspect the logs :/) and works as expected.
@@ -25,7 +25,7 @@ if [ "${OPENSTACK_DATABASES-unset}" = "unset" ] || [[ -z "${OPENSTACK_DATABASES} | |||
# If no databases options are passed, skip the database dump | |||
echo "Skip Database dump: an empty list is provided" | |||
[[ $CALLED -eq 1 ]] && exit 0 | |||
exit 0 | |||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 thanks, good catch on this
# For visual comfort when it's not a gz file we'll use a different pattern | ||
# [<container_name>.]<filename>.<logfile_extension>.<nodename> | ||
# That way we'll end up with: | ||
# manager.1.log.20240624-185849.sosreport-crc-vlf7c-master-0.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this comment, now I understand now better the logic of building the file name.
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3d15255
into
openstack-k8s-operators:main
Testing the collection of the rotated logs in openstack-k8s-operators/nova-operator#804 |
cool. The ocp sosreport now only contains the rotated logs, and the namespaces/pods tree contains the non-rotated one. |
This PR improves the log gathering in the following way:
openstack
andopenstack-operators