Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-30411 Add support for dynamically updating TLS config #17851

Merged
merged 1 commit into from
Nov 15, 2023

Conversation

ghalliday
Copy link
Member

@ghalliday ghalliday commented Oct 2, 2023

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@ghalliday ghalliday changed the base branch from master to candidate-9.4.x October 2, 2023 10:30
@github-actions
Copy link

github-actions bot commented Oct 2, 2023

@@ -1349,18 +1402,10 @@ IPropertyTree *createIssuerTlsClientConfig(const char *issuer, bool acceptSelfSi
return info.getClear();
}

IPropertyTree *getIssuerTlsServerConfig(const char *name)
static ISyncedPropertyTree * createIssuerTlsServerConfig(const char *name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's existing code (but since you are changing these).. I've been slowly changing the "name" parameter on these routines to "issuer" to be more self documenting. Or if you prefer "issuerName".

The issuer which is part of the path to the secret act like categories for the various trust zones: public, local, remote, etc.

if (!isEmptyString(passPhrase))
info->setProp("passphrase", passPhrase); // encrypted

Owned<ISyncedPropertyTree> entry = createSyncedPropertyTree(info);
Copy link
Member

@afishbeck afishbeck Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really new, although changing, but there may be a timing issue here because the meaning of "local" is being overridden.

In practical terms, looking up "local" means, "get tls configuration based on the secret at '/opt/HPCCsystems/secrets/certificates/local' ".

The various ways of configuring bare metal need consideration.
One way of configuring bare metal could be to put TLS info into the secret locations we use in kubernetes.
The other way of configuring bare-metal would be to fill in the custom TLS configuration values. queryHPCCPKIKeyFiles(&cert, &pubKey, &privKey, &passPhrase) above.

Here, a tree based on the custom bare-metal configuration values are turned into a tree that is cached to override the contents at the secret location.

If we ever (accidently?) use both the tree retrieved from cache would be whichever came first.
So I guess we need to make sure if we are going to override "local" we always override "local"? Maybe init this entry at bare-metal startup if the config parameters are used?

The alternative would be to cache under a different name, a name implying "bare-metal-mtls". But we'd have to track down what was currently depending on it being called "local".

config.setown(createPTree(issuer));

//GH-Tony should this condition be removed?
if (strieq(issuer, "remote")||strieq(issuer, "local"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you've mixed some "ClientConfig" lines in with "ServerConfig" lines.
When inside ClientConfig these lines imply that we don't currently setup issuer client certificates for issuers other than remote and local.
Using this logic on the ServerConfig won't work, because server side we always want the certs included.

extern jlib_decl unsigned getPropertyTreeHash(const IPropertyTree & source, unsigned hashcode);

//Interface for encapsulating an IPropertyTree that can be atomically updated. The result of getTree() is guaranteed
//to not be modified and to remain vali and consistent until it is released.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"valid"

@afishbeck
Copy link
Member

@ghalliday I've added some comments, but the general direction looks good.

Next steps, I think we want to:

  1. Have getIssuerTlsConfig() and getIssuerTlsServerConfigWithTrustedPeers() return ISyncedPropertyTrees instead of regular IPropertyTrees.
  2. We want to change CSecureSocketContext(const IPropertyTree* config, SecureSocketType sockettype) to take an ISyncedProtpertyTree instead of IPropertyTree... and to update the certs, keys, and CACerts if anything changes before a secure accept is performed.
  3. Then clean up any pathway between those two functions.. or any calls to the CSecureSocketContext(IPropertyTree *) constructor that don't come from secrets (for example a PTree is built from bare-metal config).

When those code paths are cleaned up we should start to see any complications that might be there.

Be careful with the idea of caching the trusted_peers because it is not tied directly to the issuer so needs another level of cache if cached.

For example, roxie can have multiple services defined and theoretically could have:

roxie:
service1 - issuer=public, trusted_peers=client1,client2
service1 - issuer=public, trusted_peers=client2,client3
service1 - issuer=local, trusted_peers=client4

The trusted config with trusted peers probably doesn't need caching at all, it can just be an object held by the secure context and one level away from the secret that is checked for being stale. But then refreshing needs to preserve the trusted peers. Not sure if that explanation made sense, but it will come out in the wash once you dive in.

@ghalliday
Copy link
Member Author

Almost complete, but one remaining issue in dafsserver.cpp(~150) createSecureSocket overriding an option needs some thought

@ghalliday ghalliday marked this pull request as ready for review October 25, 2023 16:11
@ghalliday
Copy link
Member Author

@afishbeck @mckellyln @jakesmith I think this is now ready for review for logical flaws etc. (Jake for the dafilesrv change.)
It has not been tested at all, and I plan to review it myself tomorrow before starting testing.

@ghalliday ghalliday requested a review from mckellyln October 25, 2023 16:13
Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afishbeck @mckellyln @jakesmith I think this is now ready for review for logical flaws etc. (Jake for the dafilesrv change.) It has not been tested at all, and I plan to review it myself tomorrow before starting testing.

@ghalliday - I have only looked at the dafilesrv changes, but they look fine.

@jakesmith jakesmith self-requested a review October 25, 2023 18:36
@jakesmith
Copy link
Member

@ghalliday - however, there's a build failure relating to an unreferenced symbol 'createSecureSocketConfig' building dafsclint

usr/bin/ld: CMakeFiles/dafsclient.dir/rmtclient.cpp.o: in function `_securitySettingsClient::getSecureConfig()':
/home/runner/work/HPCC-Platform/HPCC-Platform/src/fs/dafsclient/rmtclient.cpp:118: undefined reference to `createSecureSocketConfig'
collect2: error: ld returned 1 exit status

Copy link
Member Author

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One bug a few typos, and some code that needs to used synced trees.

//Return a sequence number which changes whenever the secret actually changes - so that a caller can determine
//whether it needs to reload the certificates.
virtual unsigned getVersion() const = 0;
//An indication that the property tree may be out of date because it couldn'tt be resynchronized.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn'tt

{
return secretHash;
}
virtual bool isValid() const override
{
CriticalBlock block(secretCs);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the critical section really needed? Should secretHash be atomic? Revisit when background updating is implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of an isValid() function like this? As soon as you release the critsec, the value you are returning may no longer be correct...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is typically called at startup to check that the configuration has been set up. The calling code is cleaner since it avoids the problem with freeing the returning the linked object.

if (!checkFileExists(filepath))
return nullptr;
//If information that is combined with the secret (e.g. trusted peers) can also change dynamically this would
//need to be a separate hash calculated from the condig tree
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condig->config

IPropertyTree *verify = ensurePTree(tlsConfig, "verify");
verify->setProp("trusted_peers", trusted_peers);
return tlsConfig.getClear();
const IPropertyTree *getIssuerTlsServerConfigWithTrustedPeers(const char *issuer, const char *trusted_peers)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should return a synced tree and fix the caller - then roxie server certificates will not expire.

throw makeStringException(-100, "TLS secure communication requested but not configured");

Owned<const ISyncedPropertyTree> info = getIssuerTlsSyncedConfig(issuer);
if (!!info->isValid())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops double negative!!

{
if (requireMtlsFlag && !queryMtls())
throw makeStringException(-100, "TLS secure communication requested but not configured");
if (ssf == nullptr || !ssf->queryTlsConfig())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could remove the second condition, since it will fall through the the same code below.

@@ -112,6 +112,12 @@ static class _securitySettingsClient
}
}

const IPropertyTree * getSecureConfig()
{
//Later: return a synced tree...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When? Is that for a different PR? Is there a Jira open to revisit if so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPCC-30753

//---------------------------------------------------------------------------------------------------------------------

//More: Should this move inside PTree? It may allow a more efficient hash caclulaton.
unsigned getPropertyTreeHash(const IPropertyTree & source, unsigned hashcode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this will be an expensive calculation on a large tree, that would be avoided if you could store hash of each component when it was created. But maybe it's not going to be used on large trees?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created HPCC-30752 to address optimizing it. Currently it will be used on small trees.

Copy link
Member

@richardkchapman richardkchapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. A bit too large to review properly

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday - some minor comments/questions.

: CSyncedCertificateBase(_issuer), trustedPeers(_trustedPeers), isClientConnection(_isClientConnection), acceptSelfSigned(_acceptSelfSigned), addCACert(_addCACert), disableMTLS(_disableMTLS)
{
secret.setown(resolveSecret("certificates", issuer, nullptr, nullptr));
createConfig();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like createConfig (and other methods isStale/isValid) depend on 'secret' being instantiated, but resolveSecret can return null afaics.
Should the ctor error out if it can't resolve the secret, or should isValid() check if secret for null (and other methods)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve secret can never return null - it always returns an object which can be used to find out if the value is defined, and may be updated behind the scenes.

return hashcode;
}

class UnchangingPropertyTree : extends CInterfaceOf<ISyncedPropertyTree>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that keen on the class name.
Maybe SyncedIPTWrapper or SyncedPropertyTreeWrapper

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, changed

virtual bool getProp(MemoryBuffer & result, const char * xpath) const = 0;
virtual bool getProp(StringBuffer & result, const char * xpath) const = 0;
//Return a sequence number which changes whenever the secret actually changes - so that a caller can determine
//whether it needs to reload the certificates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily 'secret/certificate specific.
Change comment to say when the property tree changes etc. ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleaned up the comments to reflect the more general applicability

virtual unsigned getVersion() const = 0;
//An indication that the property tree may be out of date because it couldn't be resynchronized.
virtual bool isStale() const = 0;
virtual bool isValid() const = 0; // Was an entry found for this secret?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily 'secret/certificate specific.


IPropertyTree *verify = config->queryPropTree("verify");
//For now only the "public" issuer implies client certificates are not required
verify->setPropBool("@enable", !disableMTLS && (isClientConnection || !strieq(issuer, "public")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 'verify' guaranteed to exist by this state? does it need to be null checked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - I will add an assert for safety

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes meaning it is guaranteed to exist

: CSyncedCertificateBase(nullptr), addCACert(_addCACert)
{
secret.setown(resolveSecret(_category, _secretName, nullptr, nullptr));
if (!secret->isValid())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like 'secret' could be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it can't - see above.

@@ -1268,9 +1268,10 @@ static bool setVerifyCertsPEMBuffer(SSL_CTX *ctx, const char *caCertBuf, int caC
return true;
}

class CSecureSocketContext : public CInterfaceOf<ISecureSocketContext>
class CSecureSocketContext : implements ISecureSocketContext, implements ISecureSocketContextCallback, public CInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a CInterfaceOf< to remove 1 vmt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could but the IMPLEMENT_USING gets a bit ugly - not enough instances to worry either way.

#else
privateKeyFName = environment->getPrivateKeyPath(keyPairName);
const char *privateKeyFName = privateKeyFName = environment->getPrivateKeyPath(keyPairName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const char *privateKeyFName = privateKeyFName ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I wonder how that compiles - because the declaration comes into effect before the value is evaluated.

Copy link
Contributor

@mckellyln mckellyln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one question posted.
Not 100% clear to me about isValid() meaning a prop tree exists and/or a secret (if not nullptr) is present.


//---------------------------------------------------------------------------------------------------------------------

//More: Should this move inside PTree? It may allow a more efficient hash caclulaton.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this in PTree could definitely avoid the binary copies. I also always think that source.getElements("*"); looks slow when walking full PTrees. If internally you could get away with something more like this->checkChildren()->getIterator(); would it be much faster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a jira to make it internal. I'm not too concerned about the efficiency at the moment.

virtual const IPropertyTree * getTree() const = 0;
virtual bool getProp(MemoryBuffer & result, const char * xpath) const = 0;
virtual bool getProp(StringBuffer & result, const char * xpath) const = 0;
//Return a sequence number which changes whenever the secret actually changes - so that a caller can determine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sequence number", hash, or just unique identifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified.

common/thorhelper/thorsoapcall.cpp Show resolved Hide resolved
if (!isClientConnection || !strieq(issuer, "public"))
updateCertificateFromSecret(secretInfo);

if (!isClientConnection || addCACert) // same condition as before, but is it correct?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming addCACert gets set the same as before, the reason this should be correct is that addCACert is usually true. A client hitting a public issuer is the case where we don't want the cacert defined. Otherwise, for MTLS we want control over our CACert using addCACert. When hitting public services using public certificate authorities we want the well known (browser compatible) CA list installed on the system instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved your comment above into the source.

if (serviceTLS)
{
protocol = "ssl";
#ifdef _USE_OPENSSL
if (isContainerized())
{
//MORE: Why is this containerized only? Should it check for this in both, and fall back to the
//old config if none found and non-containerized?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely agreed, that was the ultimate goal. Bare-metal and fusion should be able to treat certificate config as issuer secrets if they want, and if that isn't present fall back to the old style config.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created hpcc-30754

@ghalliday
Copy link
Member Author

rebased and another commit added to respond to review comments.

#ifdef _USE_OPENSSL
if (isContainerized())
protocol = "ssl";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This roxie config change looks good... but may want to fix line wrapping in commit message?

// addCACert is usually true. A client hitting a public issuer is the case where we don't want the ca cert
// defined. Otherwise, for MTLS we want control over our CACert using addCACert. When hitting public services
// using public certificate authorities we want the well known (browser compatible) CA list installed on the
// system instead.
Copy link
Member

@afishbeck afishbeck Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, to clarify a little further: caCert prevents a secure client from trying to use the issuer secret at all. This is necessary because internal components don't get a public issuer secret, they don't need one. It's ok because the OS level installed CA certs should work for public services.

When setting up a secure client to hit services in the current environment/namespace, config options like issuer, caCert, and selfSigned come from the list of services in our global config, for example:

    global:
      services:
      - name: eclqueries
        type: eclqueries
        caCert: false
        class: esp
        issuer: public
        port: 28002
        public: true
        selfSigned: true
        tls: true
      - name: roxie
        type: roxie 
        caCert: true
        class: roxie
        issuer: remote
        port: 29876
        public: true
        selfSigned: false
        target: roxie
        tls: true

@ghalliday ghalliday requested a review from afishbeck November 8, 2023 11:50
@ghalliday
Copy link
Member Author

@afishbeck I have traced through the execution and fixed a few problems. Also added some options to roxie to allow issuer in bare-metal and disable mtls. Please take a look at the follow on commits (I think you have reviewed one of them)

@ghalliday
Copy link
Member Author

NOTE: This code has a problem- that the expiry date is stored as a property in the secret tree - so that even if the contents of the secret is identical the hash will change and the config will be updated. Since this is once every 10 minute I think that is acceptable as a short term solution, but @afishbeck please let me know if you think otherwise.

@afishbeck
Copy link
Member

NOTE: This code has a problem- that the expiry date is stored as a property in the secret tree - so that even if the contents of the secret is identical the hash will change and the config will be updated. Since this is once every 10 minute I think that is acceptable as a short term solution, but @afishbeck please let me know if you think otherwise.

As we discussed offline, I don't think its a problem. It will just be a thorough test of your secret renewal code. Later I would optimize this away though so that the renewal only happens when something really has changed.

HPCC-30754 Allow roxie to use issuer based tls in bare-metal configuration

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
@ghalliday ghalliday merged commit 20afbe8 into hpcc-systems:candidate-9.4.x Nov 15, 2023
47 of 50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants