Arktos Scalability 430 2021 Tracks

Goals

Multiple resource partitions for pod scheduling (2+ for 430) - primary goal
- A tenant can have pods physically located in 2 different RPs - upon scheduling
- The scheduler in a tenant partition should be able to listen to multiple api servers belong to each RP
- Performance test for 50K hollow nodes. (2 TP, 2~3 RP)
- Performance test runs with SSL enabled
- QPS optimization
Daemon set handling in RP - non primary goal
- Remove from TP
- Support daemon set in RP
- Load test
Dynamic add/delete TP/RP design - TBD
- For design purpose only, not for implementation - consider as trying to avoid hardcode in a lot of places
- Quality bar only
- Dynamically discover new tenant partitions based on CRD objects in its resource manager
System partition pod handling - TBD

Non Goals

API gateway

Current status (release 0.7 - 2021.2.6)

Performance test status

10Kx2 cluster: 1 resource partition, support 20K hosts; 2 tenant partitions, each support 300K pods
- Density test passed
10Kx1 cluster: 1 resource partition, support 10K hosts; 1 tenant partition, each support 300K pods
- Density test passed
- Load test completed (with known failures)
Single cluster
- 8K cluster passed density test, load test completed (with known failures)
- 10K cluster density test completed with etcd too many request error, load test completed (with know failures)

Design & Development status

Code change for 2TPx1RP mostly completed and merged into master (v0.7.0)
Enable SSL in performance test - WIP (Yunwen)
Use insecure mode in local cluster setting for POC (Agreed on 3/1/2021)
Kubelet
- Use a dedicated kube-client to talk to the resource manager.
- Use multiple kube-clients to connect to multiple tenant partitions.
- Track the mapping between tenant ID and kube-clients.
- Use the right kube-client to do CRUD for all objects (To verify)
Controllers
- Node controllers (in resource partition)
  - Use a dedicated kube-client to talk to the resource manager.
  - Use multiple kube-clients to talk to multiple tenant partitions.
- Other controllers (in tenant partition)
  - If the controller list/watches node objects, it needs to use multiple kube-clients to access multiple resource managers.
- DaemonSet controller (Service/PV/AttachDetach/Garbage)
  - [] Move TTL/DaemonSet controllers to RP
  - Disable in TP, enable in RP
- Identify resources belong to RP only
- Further perf and scalability improvements (TBD, currently non goal)
  - Partition or not cache all node objects in a single process.
Scheduler
- Use a dedicated kube-client to talk to its tenant partition.
- Use multiple kube-clients to connect to multiple resource managers, list/watching nodes from all resource managers.
- [?] Use the right kube-client to update nodes objects.
- Further perf and scalability improvements (TBD)
  - Improve scheduling algorithm to reduce the possibility of scheduling conflicts.
  - Improve scheduler sampling algorithm to reduce scheduling time.
API server - TBD
- Current haven't identified areas that need to be changed
Proxy
- Working on a design that will evaluate proxy vs. code change in each components (TBD)
Performance test tools
- Cluster loader
  - How to talk to node in perf test (Hongwei)
- Kubemark
  - Support 2 TP scale out cluster set up, insecure (0.7)
  - Support 2 TP scale out cluster set up, secure mode
  - Support 2 TP, 2 RP scale out cluster set up, secure mode
- Kube-up
  - Support for scale out (current only kubemark support scale out)
Performance test
- Single RP capacity test (>= 25K, preparing for 25Kx2 goal)
- QPS optimization (x2, x3, x4, etc. in density test)
- Regular density test for 10K single cluster, 10Kx2. Each will be done after 500 node test
  - 2TP (10K), 1RP (20K), 20K density test, secure mode
Dev tools
- One box setup script for 2 TP, 1 RP (Peng, Ying)
- One box setup script for 2 TP, 2 RP (Ying)
1.18 Changes
- Complete golang 1.13.9 migration (Sonya)
  - https://github.com/CentaurusInfra/arktos/issues/923
- Metrics platform migration (YingH)
  - Migrated from metrics server to Prometheus
  - Get correct API responsiveness data

Current Work in Progress (3/31):

Support multiple RPs
1. Density test in 500 nodes (General guidelines): 2TP/2RP, scale up - 3/27
2. 2TP/2RP 2x5K density test (Sonya) - 3/30
3. Scale up 500 density test (Sonya) - 3/30
4. 2TP/2RP 2x10K density test (Sonya) - started on 3/31
5. 2TP/2RP 2x10K density test with double RS QPS (Sonya) - planed on 4/1
Perf test for SSL mode
1. 1TP/1RP 500 nodes (Sonya) - done
2. 2TP/1RP 500 nodes (Sonya, Yunwen) - 3/30
3. 1TP/1RP 10K nodes (Sonya) - parking due to 1.4
1TP/1RP limit test (Sonya, YingH)
1. 15K density test in SSL mode
QPS tuning (YingH, Sonya)
1. Increase replicaset controller QPS - test
Issue tracker
1. Check node authorizer in secure mode (Yunwen, YingH)
2. haproxy ssl check causes api server "TSL handshake error" - Yunwen master PR 1060 Issue 1048
3. Kubelet failed to upload events due to authorization error - Yunwen Issue 1046
4. KCM (deployment controller) on TP cluster failed to sync up deployment with its token - Yunwen master Issue 1039
5. KCM on TP cluster didn't get nodes in RP cluster(s) - Yunwen master PR 1040 Issue 1038
6. Failed to change ttl annotation for hollow-node - Yunwen Issue 1054
7. TP2: Unable to authenticate the request due to an error: invalid bearer token - Yunwen Issue 1055
8. RP server failed to collect pprof files - Sonya PR 1058 Issue 1057
9. Change scheduler PVC binder code to support multiple RPs - Hongwei Issue 1059
Kubeup/Kubemark improvement
1. Reduce 2TP/2RP cluster start up time (Sonya, Hongwei)

Completed Tasks

Multiple resource partition design - decided to continue multiple client connection changes in all components for multiple RP for now. Will re-design if encountered issue in current approach. (2/17)
Setup local cluster for multiple TPs, RPs (Done - 2/24)
1. Script/manual for 2TP&1RP cluster set up with 3 hosts - insecure mode (2/19) PR 994
2. Local dev environment: SSL enabled for scheduler in TP connects to RP directly (2/24) RP 1003
Component code changes
1. TP components connect to RP directly (Done - 3/15)
  1. Scheduler connect to RP directly via separated clients (2/23) PR 991
  2. KCM connected to RP directly via separated client (3/10) PR 1015
  3. Garbage collector support multiple RPs (3/15) RP 1025
2. RP components connect to TP directly (Done - 3/12)
  1. Nodelifecycle controller connects to TP directly via separated clients (3/9) PR 1011
  2. Kubelet connects to TP directly via kubeconfig (3/12) PR 1021
3. Disable/Enable controllers in TP/RP
  1. Move TTL/DaemonSet controller from TP KCM to RP KCM (3/10) PR 1015
  2. Enable service account/token controller in RP KCM local (3/15) PR 1028
Support multiple RPs in kube-up (Done)
1. Script changes to bring up and cleanup multiple RPs (2/23)
2. Merge kube-up/kubemark code from master to POC (3/15) PR 1024
3. Move DaemonSet/TTL controller etc. to RP KCM (3/16) PR 1031
4. Multiple RPs works in kube-up/kubemark (3/22)
Enable SSL in performance test
1. Code change (3/12) RP 1001
2. 1TP/1RP 500 nodes perf test (3/12)
Perf test code changes (Done)
1. Perf test changes needs for multiple RPs (3/18)
2. Disable DaemonSet test in load (3/25) PR 1050
Performance test (WIP)
1. Test single RP limit
  1. 1TP/1RP achieved 40K hollow nodes (3/3). RP CPU ~44%
2. Get more resource in GCP (80K US central 3/8)
3. 10K density test insecure mode - benchmark (3/18)
4. Multiple TPs/RPs density test
  1. 2TP/2RP 2x500 passed (3/27)
QPS tuning
1. Increase GC controller QPS (3/18)
  1. 20->40 PR 1034 10K density test 14 hours reduced to 9.6 hours
2. Increase replicaset controller QPS
  1. 20->40 PR 1034
Complete golang 1.13.9 migration (Done - 3/12)
1. Kube-openapi upgrade Issue 923
  1. Add and verify import-alias (2/10) PR 965
  2. Add hack/arktos_cherrypick.sh (2/19) PR 990
  3. Promote admission webhook API to v1. Arktos only support v1beta1 now (2/20) PR 981
  4. Promote admissionreview to v1. Arktos only support v1beta1 now (2/25) PR 998
  5. Promote CRD to v1 - (3/3) PR 1004
  6. Bump kube-openapi to 20200410 version and SMD to V3 (3/12) PR 1010
Regression fix
1. Failed to collect profiling of ETCD (3/11) Issue 1008 PR 1009
2. Static pods being recycled on TP cluster Issue 1006 (Yunwen/Verifying)
3. ETCD object counts issue in 3/10 run (3/16) PR 1027 Issue 1023

Tasks on hold

Metrics platform migration (YingH)
Regression fix
1. 500 nodes load run finished with error: DaemonSets timeout Issue 1007
System partition pod - how to handle when HA proxy is removed (TBD)
1. Density test should be OK
Issues
1. GC controller queries its own master nodes' lease info and cause 404 error in haproxy Issue 1047 - appears to be in master only. Fixed in POC. Park issue till POC changes being port back to master.
2. [Scale out POC] pod scheduler reported bound successfully but not appear in local Issue 1049 - related to system tenant design. Post 430
3. [Scale out POC] secret not found in kubelet Issue 1052 - related to system tenant design. Post 430
4. Tenant zeta request was not redirected to TP2 master correctly Issue 1056 - current proxy limitation

Issue solved in POC - pending in master

Static pods being recycled on TP cluster (fixed in POC) PR 1044 Issue 1006
Controllers on TP should union the nodes from RP cluster and local cluster - fixed in POC PR 1044 Issue 1042

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arktos Scalability 430 2021 Tracks

Goals

Non Goals

Current status (release 0.7 - 2021.2.6)

Performance test status

Design & Development status

Current Work in Progress (3/31):

Completed Tasks

Tasks on hold

Issue solved in POC - pending in master

Clone this wiki locally