Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tls for kine #9572

Merged
merged 1 commit into from
Mar 28, 2024
Merged

Add tls for kine #9572

merged 1 commit into from
Mar 28, 2024

Conversation

vitorsavian
Copy link
Member

@vitorsavian vitorsavian commented Feb 26, 2024

Proposed Changes

  • Set kine to run with TLS with the certs from the server/tls folder

Types of Changes

  • New Feature

Verification

  • You can set kine to use tls with only the --kine-tls
k3s server --kine-tls

Testing

Linked Issues

User-Facing Change

Kine is now able to use TLS

Further Comments

This feature depends on a PR from kine

Also there is another PR that will help in the verification steps

Copy link

codecov bot commented Feb 29, 2024

Codecov Report

Attention: Patch coverage is 57.14286% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 26.87%. Comparing base (3f649e3) to head (f2c948f).

❗ Current head f2c948f differs from pull request most recent head 942f0fc. Consider uploading reports for the commit 942f0fc to get more accurate results

Files Patch % Lines
pkg/cluster/storage.go 8.33% 9 Missing and 2 partials ⚠️
pkg/cluster/cluster.go 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #9572       +/-   ##
===========================================
- Coverage   53.04%   26.87%   -26.17%     
===========================================
  Files         155      154        -1     
  Lines       13597    13614       +17     
===========================================
- Hits         7212     3659     -3553     
- Misses       5018     9215     +4197     
+ Partials     1367      740      -627     
Flag Coverage Δ
e2etests ?
inttests 21.28% <57.14%> (-18.17%) ⬇️
unittests 16.17% <11.53%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vitorsavian vitorsavian force-pushed the kine-tls-k3s branch 2 times, most recently from a10a428 to 6fbfb32 Compare March 1, 2024 00:15
@vitorsavian vitorsavian changed the title [WIP] Add tls for kine Add tls for kine Mar 1, 2024
@vitorsavian vitorsavian marked this pull request as ready for review March 1, 2024 01:47
@vitorsavian vitorsavian requested a review from a team as a code owner March 1, 2024 01:47
pkg/cli/server/server.go Outdated Show resolved Hide resolved
brandond
brandond previously approved these changes Mar 5, 2024
go.mod Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the kine-tls-k3s branch 2 times, most recently from bf950f1 to f79ef25 Compare March 5, 2024 13:39
@vitorsavian vitorsavian requested review from brandond and a team March 5, 2024 13:40
brandond
brandond previously approved these changes Mar 5, 2024
@vitorsavian vitorsavian requested a review from a team March 5, 2024 18:54
dereknola
dereknola previously approved these changes Mar 5, 2024
@brandond
Copy link
Member

brandond commented Mar 7, 2024

/ # etcdctl endpoint status --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --endpoints=unixs:///var/lib/rancher/k3s/server/kine.sock -w table
+-----------------------------------------------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|                   ENDPOINT                    | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------------------------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| unixs:///var/lib/rancher/k3s/server/kine.sock |  0 |         |  4.0 MB |      true |      false |         0 |          0 |                  0 |        |
+-----------------------------------------------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

I do see that kine is started twice:

INFO[0000] Starting k3s v1.29.2+k3s-9e1a5d73 (9e1a5d73)
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s
INFO[0000] Configuring database table schema and indexes, this may take a moment...
INFO[0000] Database tables and indexes are up to date
INFO[0000] Kine available at unix://kine.sock
INFO[0000] Reconciling bootstrap data between datastore and disk
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s
INFO[0000] Configuring database table schema and indexes, this may take a moment...
INFO[0000] Database tables and indexes are up to date
INFO[0000] Kine available at unixs://kine.sock

However, it looks like both copies of kine are still running; we get duplicate/conflicting compaction messages at 5 minute intervals showing that they are both still active:

INFO[0300] COMPACT revision 0 has already been compacted
INFO[0300] COMPACT revision 0 has already been compacted
INFO[0600] COMPACT revision 0 has already been compacted
INFO[0600] COMPACT revision 0 has already been compacted
INFO[0900] COMPACT compactRev=0 targetCompactRev=21 currentRev=1021
INFO[0900] COMPACT deleted 1 rows from 21 revisions in 503.848µs - compacted to 21/1021
INFO[0900] COMPACT compact revision changed since last iteration: 0 => 21

Are we not shutting down the initial non-tls instance of kine? When we start up the temporary etcd for bootstrapping, we do so with a bootstrap context that is cancelled once the data is extracted, so that etcd shuts down cleanly:

reconcileCtx, cancel := context.WithCancel(ctx)
defer func() {
cancel()
c.config.Runtime.EtcdConfig = originalConfig
}()
e := etcd.NewETCD()
if err := e.SetControlConfig(c.config); err != nil {
return err
}
if err := e.StartEmbeddedTemporary(reconcileCtx); err != nil {
return err
}

We need to do the same for kine so that the initial one doesn't remain running after bootstrapping is complete.

pkg/cluster/storage.go Outdated Show resolved Hide resolved
@brandond
Copy link
Member

brandond commented Mar 8, 2024

With the added context cancellation it looks like the extra kine is getting shut down properly, but the early shutdown does cause some additional errors to be emitted by kine:

INFO[0000] Starting k3s v1.29.2+k3s-413c7e68 (413c7e68)
INFO[0000] Starting temporary kine to reconcile with datastore
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s
INFO[0000] Configuring database table schema and indexes, this may take a moment...
INFO[0000] Database tables and indexes are up to date
INFO[0000] Kine available at unix://kine.sock
ERRO[0000] Failed to list / for revision 1: context canceled
INFO[0000] Starting k3s v1.29.2+k3s-413c7e68 (413c7e68)
INFO[0000] Starting temporary kine to reconcile with datastore
INFO[0000] Configuring sqlite3 database connection pooling: maxIdleConns=2, maxOpenConns=0, connMaxLifetime=0s
INFO[0000] Configuring database table schema and indexes, this may take a moment...
INFO[0000] Database tables and indexes are up to date
INFO[0000] Kine available at unix://kine.sock
INFO[0000] Reconciling bootstrap data between datastore and disk
ERRO[0000] TTL event watch failed to get start revision
INFO[0000] TTL events work queue has shut down

Its possible that adding a short sleep before shutting down kine would prevent at least some of these errors from popping up? We do something similar when terminating etcd after a restore - although we probably don't need to wait 5 seconds as we do there.

k3s/pkg/etcd/etcd.go

Lines 340 to 343 in fe2ca9e

// Cancel the etcd server context and allow it time to shutdown cleanly.
// Ideally we would use a waitgroup and properly sequence shutdown of the various components.
e.cancel()
time.Sleep(time.Second * 5)

It is also possible that you'd need to make changes on the kine side to not log errors if the err is due to context cancellation.

@vitorsavian vitorsavian force-pushed the kine-tls-k3s branch 2 times, most recently from 62515c1 to aa4bec7 Compare March 13, 2024 16:55
pkg/cluster/storage.go Outdated Show resolved Hide resolved
brandond
brandond previously approved these changes Mar 14, 2024
Copy link
Member

@brandond brandond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

pkg/etcd/etcd.go Outdated Show resolved Hide resolved
@vitorsavian vitorsavian force-pushed the kine-tls-k3s branch 3 times, most recently from 396958c to f2c948f Compare March 25, 2024 12:55
@vitorsavian vitorsavian requested a review from brandond March 25, 2024 12:56
Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Bump kine

Signed-off-by: Vitor Savian <vitor.savian@suse.com>

Add integration tests for kine with tls

Signed-off-by: Vitor Savian <vitor.savian@suse.com>
@vitorsavian vitorsavian requested a review from a team March 27, 2024 16:39
@vitorsavian vitorsavian merged commit 5d69d6e into k3s-io:master Mar 28, 2024
25 checks passed
@vitorsavian vitorsavian deleted the kine-tls-k3s branch May 20, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants