K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

egegunes · 2024-11-11T15:10:46Z

CHANGE DESCRIPTION

Problem:
We don't allow users to run backups in unmanaged clusters.

Cause:
We deliberately added this limitation to prevent confusing users. If you run a backup in an unmanaged (and secondary) cluster, backup object will be created in unmanaged cluster but you won't be able to restore it in the same cluster.

Solution:
We're removing the limitation. Technically, we don't need to do much thanks to distributed architecture of PBM. Users can now run backups either in managed or unmanaged clusters. pbm-agent instances will select a node to take backup from among themselves.

Warning

There are some caveats which might be confusing:

Even if the backup is started in primary (managed) cluster, most likely it'll actually be taken from a secondary even if it's in a separate cluster. This is because PBM automatically assigns a lower priority to primary member so backup won't have any negative effect on write performance. Users have the chance the provide a custom PBM configuration to change this behavior.
Even though you can run backups in unmanaged clusters, you can't run restores on them.
PBM configuration is shared across all clusters, since it's stored in MongoDB. Operator will reconfigure PBM every time it runs a backup and setting PBM configuration in one cluster will affect the others too. For example if you set oplogSpanMin to 2 in a secondary cluster, this will be applied to primary cluster too.

CHECKLIST

Jira

Is the Jira ticket created and referenced properly?
Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

Is an E2E test/test case added for the new feature/change?
Are unit tests added where appropriate?
Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

Are all needed new/changed options added to default YAML files?
Are all needed new/changed options added to the Helm Chart?
Did we add proper logging messages for operator actions?
Did we ensure compatibility with the previous version or cluster upgrade process?
Does the change support oldest and newest supported MongoDB version?
Does the change support oldest and newest supported Kubernetes version?

JNKPercona · 2024-11-12T18:26:50Z

Test name	Status
arbiter	passed
balancer	passed
custom-replset-name	passed
custom-tls	passed
custom-users-roles	passed
custom-users-roles-sharded	passed
cross-site-sharded	passed
data-at-rest-encryption	passed
data-sharded	passed
demand-backup	passed
demand-backup-fs	passed
demand-backup-eks-credentials	passed
demand-backup-physical	passed
demand-backup-physical-sharded	passed
demand-backup-sharded	passed
expose-sharded	passed
ignore-labels-annotations	passed
init-deploy	passed
finalizer	passed
ldap	passed
ldap-tls	passed
limits	passed
liveness	passed
mongod-major-upgrade	passed
mongod-major-upgrade-sharded	passed
monitoring-2-0	passed
multi-cluster-service	passed
non-voting	passed
one-pod	passed
operator-self-healing-chaos	passed
pitr	passed
pitr-sharded	passed
pitr-physical	passed
pvc-resize	passed
recover-no-primary	passed
replset-overrides	passed
rs-shard-migration	passed
scaling	passed
scheduled-backup	passed
security-context	passed
self-healing-chaos	passed
service-per-pod	passed
serviceless-external-nodes	passed
smart-update	passed
split-horizon	passed
storage	passed
tls-issue-cert-manager	passed
upgrade	passed
upgrade-consistency	passed
upgrade-consistency-sharded-tls	passed
upgrade-sharded	passed
users	passed
version-service	passed
We run 53 out of 53

commit: 6bc38f5
image: perconalab/percona-server-mongodb-operator:PR-1715-6bc38f55

github-actions · 2024-11-13T21:06:44Z

e2e-tests/cross-site-sharded/run

  local endpoint="$1"
  local rsName="$2"
-  local nodes_amount=0
-  until [[ ${nodes_amount} == 6 ]]; do
-    nodes_amount=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
+  local target_count=$3
+
+  local nodes_count=0
+  until [[ ${nodes_count} == ${target_count} ]]; do
+    nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \
     | egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \
     | $sed -re 's/ObjectId\("[0-9a-f]+"\)//; s/-[0-9]+.svc/-xxx.svc/')

-    echo "waiting for all members to be configured in ${rsName}"
+    echo -n "waiting for all members to be configured in ${rsName}"
    let retry+=1
    if [ $retry -ge 15 ]; then
-      echo "Max retry count $retry reached. something went wrong with mongo cluster. Config for endpoint $endpoint has $nodes_amount but expected 6."
+      echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."
      exit 1
    fi
-    echo -n .
+    echo .
    sleep 10
  done


[shfmt] _{reported by reviewdog 🐶}

Suggested change

local endpoint="$1"

local rsName="$2"

local nodes_amount=0

until [[ ${nodes_amount} == 6 ]]; do

nodes_amount=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \

local target_count=$3

local nodes_count=0

until [[ ${nodes_count} == ${target_count} ]]; do

nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \

| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \

| $sed -re 's/ObjectId$"[0-9a-f]+"$//; s/-[0-9]+.svc/-xxx.svc/')

echo "waiting for all members to be configured in ${rsName}"

echo -n "waiting for all members to be configured in ${rsName}"

let retry+=1

if [ $retry -ge 15 ]; then

echo "Max retry count $retry reached. something went wrong with mongo cluster. Config for endpoint $endpoint has $nodes_amount but expected 6."

echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."

exit 1

fi

echo -n .

echo .

sleep 10

done

local endpoint="$1"

local rsName="$2"

local target_count=$3

local nodes_count=0

until [[ ${nodes_count} == ${target_count} ]]; do

nodes_count=$(run_mongos 'rs.conf().members.length' "clusterAdmin:clusterAdmin123456@$endpoint" "mongodb" ":27017" \

| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \

| $sed -re 's/ObjectId$"[0-9a-f]+"$//; s/-[0-9]+.svc/-xxx.svc/')

echo -n "waiting for all members to be configured in ${rsName}"

let retry+=1

if [ $retry -ge 15 ]; then

echo "Max retry count ${retry} reached. something went wrong with mongo cluster. Config for endpoint ${endpoint} has ${nodes_count} but expected ${target_count}."

exit 1

fi

echo .

sleep 10

done

pull-request-size bot added the size/M 30-99 lines label Nov 11, 2024

K8SPSMDB-1205: Allow backups in unmanaged clusters

9941eee

egegunes force-pushed the K8SPSMDB-1205 branch from 099c6d0 to 9941eee Compare November 12, 2024 11:39

egegunes marked this pull request as ready for review November 12, 2024 15:18

egegunes requested review from tplavcic, nmarukovich, ptankov, jvpasinatto, eleo007, hors, inelpandzic and pooknull as code owners November 12, 2024 15:18

don't allow restores in unmanaged clusters

6bc38f5

inelpandzic approved these changes Nov 12, 2024

View reviewed changes

percona deleted a comment from github-actions bot Nov 13, 2024

hors approved these changes Nov 13, 2024

View reviewed changes

Merge branch 'main' into K8SPSMDB-1205

66d0ff0

hors merged commit 40ec6d1 into main Nov 13, 2024
10 of 13 checks passed

hors deleted the K8SPSMDB-1205 branch November 13, 2024 21:06

github-actions bot reviewed Nov 13, 2024

View reviewed changes

egegunes added this to the v1.19.0 milestone Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

egegunes commented Nov 11, 2024 •

edited

Loading

JNKPercona commented Nov 12, 2024

github-actions bot Nov 13, 2024

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

K8SPSMDB-1205: Allow backups in unmanaged clusters #1715

Conversation

egegunes commented Nov 11, 2024 • edited Loading

CHANGE DESCRIPTION

CHECKLIST

JNKPercona commented Nov 12, 2024

github-actions bot Nov 13, 2024

Choose a reason for hiding this comment

egegunes commented Nov 11, 2024 •

edited

Loading