Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

Merged
merged 3 commits into from
Nov 13, 2024
Merged

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

merged 3 commits into from
Nov 13, 2024

Conversation

egegunes
Copy link
Contributor

@egegunes egegunes commented Nov 11, 2024

K8SPSMDB-1205 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem:
We don't allow users to run backups in unmanaged clusters.

Cause:
We deliberately added this limitation to prevent confusing users. If you run a backup in an unmanaged (and secondary) cluster, backup object will be created in unmanaged cluster but you won't be able to restore it in the same cluster.

Solution:
We're removing the limitation. Technically, we don't need to do much thanks to distributed architecture of PBM. Users can now run backups either in managed or unmanaged clusters. pbm-agent instances will select a node to take backup from among themselves.

Warning

There are some caveats which might be confusing:

  1. Even if the backup is started in primary (managed) cluster, most likely it'll actually be taken from a secondary even if it's in a separate cluster. This is because PBM automatically assigns a lower priority to primary member so backup won't have any negative effect on write performance. Users have the chance the provide a custom PBM configuration to change this behavior.
  2. Even though you can run backups in unmanaged clusters, you can't run restores on them.
  3. PBM configuration is shared across all clusters, since it's stored in MongoDB. Operator will reconfigure PBM every time it runs a backup and setting PBM configuration in one cluster will affect the others too. For example if you set oplogSpanMin to 2 in a secondary cluster, this will be applied to primary cluster too.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size bot added the size/M 30-99 lines label Nov 11, 2024
@JNKPercona
Copy link
Collaborator

Test name Status
arbiter passed
balancer passed
custom-replset-name passed
custom-tls passed
custom-users-roles passed
custom-users-roles-sharded passed
cross-site-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-fs passed
demand-backup-eks-credentials passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
ignore-labels-annotations passed
init-deploy passed
finalizer passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 passed
multi-cluster-service passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-sharded passed
pitr-physical passed
pvc-resize passed
recover-no-primary passed
replset-overrides passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 53 out of 53

commit: 6bc38f5
image: perconalab/percona-server-mongodb-operator:PR-1715-6bc38f55

@percona percona deleted a comment from github-actions bot Nov 13, 2024
@hors hors merged commit 40ec6d1 into main Nov 13, 2024
10 of 13 checks passed
@hors hors deleted the K8SPSMDB-1205 branch November 13, 2024 21:06
Comment on lines 17 to 35
local endpoint="$1"
local rsName="$2"
local nodes_amount=0
until [[ ${nodes_amount} == 6 ]]; do
nodes_amount=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
local target_count=$3

local nodes_count=0
until [[ ${nodes_count} == ${target_count} ]]; do
nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \
| $sed -re 's/ObjectId\("[0-9a-f]+"\)//; s/-[0-9]+.svc/-xxx.svc/')

echo "waiting for all members to be configured in ${rsName}"
echo -n "waiting for all members to be configured in ${rsName}"
let retry+=1
if [ $retry -ge 15 ]; then
echo "Max retry count $retry reached. something went wrong with mongo cluster. Config for endpoint $endpoint has $nodes_amount but expected 6."
echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."
exit 1
fi
echo -n .
echo .
sleep 10
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[shfmt] reported by reviewdog 🐶

Suggested change
local endpoint="$1"
local rsName="$2"
local nodes_amount=0
until [[ ${nodes_amount} == 6 ]]; do
nodes_amount=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
local target_count=$3
local nodes_count=0
until [[ ${nodes_count} == ${target_count} ]]; do
nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \
| $sed -re 's/ObjectId\("[0-9a-f]+"\)//; s/-[0-9]+.svc/-xxx.svc/')
echo "waiting for all members to be configured in ${rsName}"
echo -n "waiting for all members to be configured in ${rsName}"
let retry+=1
if [ $retry -ge 15 ]; then
echo "Max retry count $retry reached. something went wrong with mongo cluster. Config for endpoint $endpoint has $nodes_amount but expected 6."
echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."
exit 1
fi
echo -n .
echo .
sleep 10
done
local endpoint="$1"
local rsName="$2"
local target_count=$3
local nodes_count=0
until [[ ${nodes_count} == ${target_count} ]]; do
nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \
| $sed -re 's/ObjectId\("[0-9a-f]+"\)//; s/-[0-9]+.svc/-xxx.svc/')
echo -n "waiting for all members to be configured in ${rsName}"
let retry+=1
if [ $retry -ge 15 ]; then
echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."
exit 1
fi
echo .
sleep 10
done

@egegunes egegunes added this to the v1.19.0 milestone Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M 30-99 lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants