Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-33018 Expose chosen OpenSSL cryptography capabilities via a plugin #19343

Merged
merged 1 commit into from
Jan 10, 2025

Conversation

jackdelv
Copy link
Contributor

@jackdelv jackdelv commented Dec 5, 2024

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

@jackdelv jackdelv requested a review from dcamper December 5, 2024 18:51
Copy link

github-actions bot commented Dec 5, 2024

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33018

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

Copy link
Contributor

@dcamper dcamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stopped reviewing after I realized that there may be an issue with supporting passphrases for private keys.

Your RSA.Sign() test uses both the private key and a passphrase, and you indicate that it works. That implies that the key requires that passphrase. But, that key is used elsewhere without the passphrase, and it also appears to work. That isn't right; something should be failing.

Take a look at that, please.

ecllibrary/std/OpenSSL.ecl Show resolved Hide resolved
ecllibrary/std/OpenSSL.ecl Show resolved Hide resolved
*
* @see AvailableAlgorithms()
*/
EXPORT DATA Hash(DATA _indata, VARSTRING _hash_name) := lib_openssl.OpenSSL.digesthash(_indata, _hash_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the ECL version of the arguments friendly: indata andhash_name (no leading underscores).

Thought: Would it be worthwhile to settle on an argument name of "algorithm_name" (or something, as long as it is consistent) everywhere that argument is used? Thinking of matching the "AvailableAlgorithms()" naming....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worthwhile to make the argument name consistent everywhere. I have set it to "algorithm_name."

* it must be of the expected size for the given
* algorithm; OPTIONAL, defaults to creating a
* random value
* @param salt TCURRENT_OPENSSL_VERSIONencryption; if not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a 'replace all' event went wrong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

ecllibrary/std/OpenSSL.ecl Show resolved Hide resolved
*
* @see Encrypt()
*/
EXPORT DATA Decrypt(DATA ciphertext, STRING pem_private_key) := lib_openssl.OpenSSL.rsaDecrypt(ciphertext, pem_private_key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should permit passing a passphrase for the private key, like RSA.Sign().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added passphrase parameter.

* @see Seal()
* Ciphers.AvailableAlgorithms()
*/
EXPORT DATA Unseal(DATA ciphertext, STRING pem_private_key, VARSTRING symmetric_algorithm = 'aes-256-cbc') := lib_openssl.OpenSSL.rsaUnseal(ciphertext, pem_private_key, symmetric_algorithm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should permit passing a passphrase for the private key, like RSA.Sign().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added passphrase parameter.

* @see Digest.AvailableAlgorithms()
* Sign()
*/
EXPORT BOOLEAN VerifySignature(DATA signature, DATA signedData, DATA passphrase, STRING pem_public_key, VARSTRING hash_name) := lib_openssl.OpenSSL.rsaVerifySignature(signature, signedData, passphrase, pem_public_key, hash_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public keys don't have passphrases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed passphrase parameter from VerifySignature.

EXPORT encrypt_default_iv_default_salt := Std.OpenSSL.Ciphers.Encrypt((DATA)PLAINTEXT, CIPHERS_CIPHER, (DATA)PASSPHRASE);
EXPORT encrypt_rsa := Std.OpenSSL.RSA.Encrypt((DATA)PLAINTEXT, RSA_PUBLIC_1);
EXPORT seal_rsa := Std.OpenSSL.RSA.Seal((DATA)PLAINTEXT, [RSA_PUBLIC_1, RSA_PUBLIC_2]);
EXPORT signed_rsa_sha256 := Std.OpenSSL.RSA.Sign((DATA)PLAINTEXT, (DATA)PASSPHRASE, RSA_PRIVATE_1, 'sha256');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was RSA_PRIVATE_1 created with a non-empty passphrase? If not then I don't see how this could work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was accepting the passphrase parameter, but not doing anything with it. The passphrase is now used and I have added RSA_PRIVATE_2 that was created with a non-empty passphrase to test.

@jackdelv jackdelv requested a review from dcamper December 6, 2024 13:56
Copy link
Contributor

@dcamper dcamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! Just a few changes for this round.

ecllibrary/std/OpenSSL.ecl Show resolved Hide resolved
* @see Digest.AvailableAlgorithms()
* VerifySignature()
*/
EXPORT DATA Sign(DATA plaintext, DATA passphrase, STRING pem_private_key, VARSTRING algorithm_name) := lib_openssl.OpenSSL.rsaSign(plaintext, passphrase, pem_private_key, algorithm_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make algorithm_name default to 'sha256'. That is the most common hash algorithm for signing. Remember to adjust the @param doc accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed default to value to sha256.

* @see Digest.AvailableAlgorithms()
* Sign()
*/
EXPORT BOOLEAN VerifySignature(DATA signature, DATA signedData, STRING pem_public_key, VARSTRING algorithm_name) := lib_openssl.OpenSSL.rsaVerifySignature(signature, signedData, pem_public_key, algorithm_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as with Sign(): Set the default value of algorithm_name to 'sha256' and adjust the @param doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

EXPORT STRING CIPHERS_CIPHER := 'aes-256-cbc';
EXPORT STRING PASSPHRASE := 'mypassphrase';

EXPORT STRING RSA_PUBLIC_1 :=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why not use the triple-apostrophe multi-line quote scheme for these big PEM keys? It's fine the way it is, just asking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using the triple-apostrophe syntax, the regression test would fail inside PEM_read_bio_PrivateKey with "Error: -1: Error within loading a pkey: error:1E08010C:DECODER routines::unsupported", but worked just fine using the other syntax.

EXPORT seal_rsa := Std.OpenSSL.RSA.Seal((DATA)PLAINTEXT, [RSA_PUBLIC_1, RSA_PUBLIC_2]);
EXPORT signed_rsa_sha256 := Std.OpenSSL.RSA.Sign((DATA)PLAINTEXT, (DATA)'', RSA_PRIVATE_1, 'sha256');
EXPORT signed_rsa_sha256_passphrase := Std.OpenSSL.RSA.Sign((DATA)PLAINTEXT, (DATA)PASSPHRASE, RSA_PRIVATE_2, 'SHA256');
// EXPORT signed_rsa_sha256_wrong_passphrase := Std.OpenSSL.RSA.Sign((DATA)PLAINTEXT, (DATA)'notmypassphrase', RSA_PRIVATE_2, 'SHA256'); Fails with Error: -1: Error within loading a pkey: error:1C800064:Provider routines::bad decrypt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked Gavin for recommendations on how to test this. Trapping rtlFail() can be difficult, if not impossible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that it isn't possible to cleanly capture a failure that throws an exception. My advice is to leave the failure tests in the file, commented-out, along with a comment describing why they are commented-out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation to the test.

EXPORT DATA Decrypt(DATA ciphertext, VARSTRING algorithm_name, DATA passphrase, DATA iv = (DATA)'', DATA salt = (DATA)'') := lib_openssl.OpenSSL.cipherDecrypt(ciphertext, algorithm_name, passphrase, iv, salt);
END; // Ciphers

EXPORT RSA := MODULE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the MODULEs I defined need the best. Existing structure is:

OpenSSL
    Digest
        AvailableAlgorithms()
        Hash()
    Ciphers
        AvailableAlgorithms()
        IVSize()
        SaltSize()
        Encrypt()
        Decrypt()
    RSA
        Seal()
        Unseal()
        Encrypt()
        Decrypt()
        Sign()
        VerifySignature()

New structure should be:

OpenSSL
    Digest
        AvailableAlgorithms()
        Hash()
    Cipher *
        AvailableAlgorithms()
        IVSize()
        SaltSize()
        Encrypt()
        Decrypt()
    PublicKey *
        RSASeal() *
        RSAUnseal() *
        Encrypt()
        Decrypt()
        Sign()
        VerifySignature()

Asterisks indicate changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

failOpenSSLError("creating buffer for EVP_PKEY");

EVP_PKEY * pkey;
if (startsWith(key, "-----BEGIN RSA PUBLIC KEY-----"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This public key may not be RSA; it could be ECC. I think you should check for 'contains "PUBLIC KEY-----"' (not sure about the dashes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a function isPublicKey to check the string for "PUBLIC KEY-----".

// Create and initialize the message digest context
mdCtx = EVP_MD_CTX_new();
if (!mdCtx)
failOpenSSLError("EVP_MD_CTX_new (rsaSign)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'rsaSign' is a misnomer, because you can sign with other public key types. This may need alteration elsewhere in the code as well.

Technically, there is a subset of public key types that can be used for signing. We can let OpenSSL generate an error message if necessary; no need to explicitly check. For reference:

Only EVP_PKEY types that support signing can be used with these functions. This includes MAC algorithms where the MAC generation is considered as a form of "signing". Built-in EVP_PKEY types supported by these functions are CMAC, Poly1305, DSA, ECDSA, HMAC, RSA, SipHash, Ed25519 and Ed448.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed RSA naming from Sign and VerifySignature and changed the modulename to PublicKey

}
}

OPENSSL_API void OPENSSL_CALL rsaSign(ICodeContext *ctx, size32_t & __lenResult, void * & __result, size32_t len_plaintext, const void * _plaintext, size32_t len_passphrase, const void * _passphrase, size32_t len_pem_private_key, const char * _pem_private_key, const char * _algorithm_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this function because it is not RSA-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pkSign as part of the Public Key module.

}
}

OPENSSL_API bool OPENSSL_CALL rsaVerifySignature(ICodeContext *ctx, size32_t len_signature, const void * _signature, size32_t len_signedData, const void * _signedData, size32_t len_pem_public_key, const char * _pem_public_key, const char * _algorithm_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest renaming this function because it is not RSA-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to pkVerifySignature

@jackdelv jackdelv requested a review from dcamper December 10, 2024 19:48
@jackdelv jackdelv marked this pull request as ready for review December 10, 2024 19:58
@jackdelv jackdelv requested a review from ghalliday December 10, 2024 19:59
Copy link
Contributor

@dcamper dcamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Approving, but I would ask that you make the trivial changes I noted.

*
* @return The encrypted ciphertext.
*
* @see Unseal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did some renaming, so this Unseal() was changed to RSAUnseal(). Check all other @see functions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the @see functions in the comments.

OpenSSL::SSL
OpenSSL::Crypto
)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need a final linefeed in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a linefeed.

plugins/openssl/openssl.cpp Outdated Show resolved Hide resolved
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. A comment about the name of the plugin, an object leak and a few scattered comments about style.

# Cmake Input File for openssl
#############################################################

project(openssl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect a different name would be better, otherwise it is confusing whether it means the real openssl library. sslservices?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to sslservices. The toplevel std/OpenSSL.ecl file is still named OpenSSl. Should that be renamed as well e.g. SSL.ecl?

#define _OPENSSL_INCL

#ifdef _WIN32
#define OPENSSL_CALL _cdecl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Do the regression tests work on windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is required. I added it to stay consistent with cryptolib and fileservices.

I also don't think it works on windows because it expects 'ecl' to be installed. I haven't tried doing a windows build though.

}
}
misses++;
const T * newObj = getObjectByName(algorithm_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these will never be freed. Should this be using a unique_ptr in the tuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithms and digests are allocated by OPENSSL_init_crypto, and generally, they are freed when the application exits. They can be explicitly freed by a call to OPENSSL_cleanup, but that is discouraged in the documentation. I think it is currently correct.

failOpenSSLError("adding new object to cache");

return newObj;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: ; not needed after a function definition (and in other places)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

void clear()
{
for (auto& c : cache)
EVP_PKEY_free(std::get<1>(c));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use unique_ptr to avoid having to explicitly delete the associated items. (Or create a wrapper class.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use unique_ptr.


EXPORT STRING RSA_PUBLIC_1 :=
'-----BEGIN RSA PUBLIC KEY-----' + '\n' +
'MIICCgKCAgEAuLYYTmPMzp13999s7bUAaJZVON+9k/2PDBF5AfHtQ40FmxwRdG3d' + '\n' +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may trigger warnings from scanning rules about private keys being exported in public repositories.... I'm not quite sure what the best solution idea is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestCrypto_PKE.ecl Includes a private key the same way for testing. I'm not sure if there is a better way either.

{
for (size_t x = 0; x < publicKeys.size(); x++)
{
if (encryptedKeys[x])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: A general point, for free, delete and rtlFree it is legal to pass a null ptr, and it is defined to do nothing - so no need to check first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed checks around rtlFree.


// Allocate memory for encrypted keys
size_t keyCount = publicKeys.size();
encryptedKeys = static_cast<byte **>(rtlMalloc(sizeof(byte *) * keyCount));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: You could use
enryptedKeys = new byte * [keyCount];
and
delete [] encryptedKeys;

rtlMalloc and rtlFree need to be used for allocated data that is returned from the function. The reason is to avoid conflicts between the debug and release versions of the c++ library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use new/delete syntax instead of rtlMalloc/rtlFree.

failOpenSSLError("EVP_SealFinal");
ciphertextLen += len;

int totalKeyLen = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general: Use size32_t for the sizes of objects that are never going to be ridiculously large, size_t for full compatibility. Same in various other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use size32_t instead of int in most places.

// Keys; each is (size_t) + <content>
for (size_t x = 0; x < keyCount; x++)
{
size_t keyLen = keyLens[x];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be size32_t rather than size_t? All ecl structures are 32 bit lengths. Another alternative if it is not read from ecl, you could use appendPacked()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be size32_t.

@jackdelv jackdelv requested a review from ghalliday December 12, 2024 21:11
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks jack, a few more comments.

{

void failOpenSSLError(const std::string& context)
{
unsigned long errCode = 0;
char buffer[120];
size_t errCode = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused variable.

@@ -77,7 +77,7 @@ void failOpenSSLError(const std::string& context)
int passphraseCB(char *passPhraseBuf, int passPhraseBufSize, int rwflag, void *pPassPhraseMB)
{
size32_t len = ((MemoryBuffer*)pPassPhraseMB)->length();
if (passPhraseBufSize >= (int)len)
if (passPhraseBufSize >= len)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to cast passPhraseBufSize to size32_t to avoid complains about mixing signed and unsigned in a comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cast to size32_t for passPhraseBufSize.


private:
int hits;
int misses;
size32_t hits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be unsigned. size32_t is only used for numbers of bytes - not arbitrary counts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to unsigned.

int hits;
int misses;
std::list<std::tuple<unsigned, EVP_PKEY *>> cache;
size32_t hits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about unsigned here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Also reverted the hash to an unsigned since it is not a number of bytes.

std::list<std::tuple<unsigned, EVP_PKEY *>> cache;
size32_t hits;
size32_t misses;
std::list<std::tuple<size32_t, std::unique_ptr<EVP_PKEY, decltype(&EVP_PKEY_free)>>> cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup: you can use a typedef/using to declare a type for a unique Pkey, which simplifies the code when adding to the cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added typedef.

cache.pop_back();
}
}
else
failOpenSSLError("loading a pkey");

return pkey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-threading: What happens if there are lots of calls to resolve items in the cache from other threads - so this item is evicted (and deleted) as soon as the function returns. May require a shared pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is an issue since the caches are initialized with a thread_local storage specifier. Should the caches be shared among multiple threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be clearly documented as a comment on the class. The cached results cannot be relied on if it is called from multiple threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note to caches.


//--------------------------------------------------------------------------
// Advertised Entry Point Functions
// Advertised Entry Posize32_t Functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly too enthusiastic replacement of int!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -407,17 +398,17 @@ OPENSSL_API void OPENSSL_CALL cipherEncrypt(ICodeContext *ctx, size32_t & __lenR

try
{
int len = 0;
int ciphertextLen = 0;
size32_t len = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a case where using int is probably correct - because that is the type required by the api you are calling. (technically the reinterpret cast is undefined behaviour.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed cases where the type is required by the api to use that type.

for (size_t x = 0; x < publicKeys.size(); x++)
rtlFree(encryptedKeys[x]);
rtlFree(encryptedKeys);
delete [] encryptedKeys;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to iterate and delete the encryptedKeys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only fixed in the catch{} block - it needs similar code here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in try block.

plaintext.ensureCapacity(plaintextLen);

int len = 0;
if (EVP_OpenUpdate(decryptCtx, static_cast<byte *>(plaintext.bufferBase()), &len, newCipherText, newCipherTextLen) != 1)
size32_t len = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about int being correct for this call/instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

@jackdelv jackdelv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghalliday Fixed those issues and had one question about the storage class of the caches.

{

void failOpenSSLError(const std::string& context)
{
unsigned long errCode = 0;
char buffer[120];
size_t errCode = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unused variable.

@@ -77,7 +77,7 @@ void failOpenSSLError(const std::string& context)
int passphraseCB(char *passPhraseBuf, int passPhraseBufSize, int rwflag, void *pPassPhraseMB)
{
size32_t len = ((MemoryBuffer*)pPassPhraseMB)->length();
if (passPhraseBufSize >= (int)len)
if (passPhraseBufSize >= len)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cast to size32_t for passPhraseBufSize.


private:
int hits;
int misses;
size32_t hits;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to unsigned.

int hits;
int misses;
std::list<std::tuple<unsigned, EVP_PKEY *>> cache;
size32_t hits;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Also reverted the hash to an unsigned since it is not a number of bytes.

std::list<std::tuple<unsigned, EVP_PKEY *>> cache;
size32_t hits;
size32_t misses;
std::list<std::tuple<size32_t, std::unique_ptr<EVP_PKEY, decltype(&EVP_PKEY_free)>>> cache;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added typedef.


//--------------------------------------------------------------------------
// Advertised Entry Point Functions
// Advertised Entry Posize32_t Functions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -407,17 +398,17 @@ OPENSSL_API void OPENSSL_CALL cipherEncrypt(ICodeContext *ctx, size32_t & __lenR

try
{
int len = 0;
int ciphertextLen = 0;
size32_t len = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed cases where the type is required by the api to use that type.

for (size_t x = 0; x < publicKeys.size(); x++)
rtlFree(encryptedKeys[x]);
rtlFree(encryptedKeys);
delete [] encryptedKeys;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

plaintext.ensureCapacity(plaintextLen);

int len = 0;
if (EVP_OpenUpdate(decryptCtx, static_cast<byte *>(plaintext.bufferBase()), &len, newCipherText, newCipherTextLen) != 1)
size32_t len = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

cache.pop_back();
}
}
else
failOpenSSLError("loading a pkey");

return pkey;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is an issue since the caches are initialized with a thread_local storage specifier. Should the caches be shared among multiple threads?

@jackdelv jackdelv requested a review from ghalliday December 20, 2024 16:16
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments about the thread safety of the caches. Something to discuss with Dan whether it is best to have per thread caches, or one shared cache.
If they are per thread, then the initialisation all needs to happen in the constructor, and clean up in the destructor. If one shared, then the returned object's lifetime needs more thought.

void AlgorithmCache<EVP_MD>::setCacheName() {cacheName = "DIGEST";}

template <>
const EVP_CIPHER * AlgorithmCache<EVP_CIPHER>::getObjectByName(const char * name) { return EVP_get_cipherbyname(name); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these objects need cleaning up? I suspect not, but please confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the lists of ciphers/digests are populated at initialization and cleaned up automatically by OpenSSL.

void init()
{
setCacheName();
hits = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Safer/cleaner to have default initialisers on the member variables. They should be initialised in the (implicit or explicit) constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added default initializer to cache member variables.


MODULE_INIT(INIT_PRIORITY_STANDARD)
{
pkeyCache.init();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are thread-local variables you cannot rely on calling init.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed init functions and moved logic to constructor and added default initializers.

return newObj;
}

void printStatistics() {DBGLOG("SSLSERVICES %s CACHE STATS: HITS = %d, MISSES = %d", cacheName.c_str(), hits, misses);}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better logged as a metric, with text that is machine readable. Something along the lines of ...:

LOG(MCmonitorMetric, "{ \"type\": \"metric\", \"name\": \"sslServiceCache%s\", \"hits\": \"%u\", \"misses\": \"%u\" }", cacheName.c_str(), hits, misses);

There is only one other place that currently does this, but it would be good to switch all existing equivalents to a similar pattern. It would be worth rationalising the format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed printStatistics functions and change to LOG call in the destructor.

const T * getObjectByName(const char * name);
};

template <>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't know you could do this...

for (size_t x = 0; x < publicKeys.size(); x++)
rtlFree(encryptedKeys[x]);
rtlFree(encryptedKeys);
delete [] encryptedKeys;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only fixed in the catch{} block - it needs similar code here.

size32_t keySize = 0;
memcpy(&keySize, inPtr, sizeof(keySize));
inPtr += sizeof(keySize);
encryptedKeys.emplace_back(reinterpret_cast<const char *>(inPtr), keySize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same potential problem with large numbers of keys, and cache size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the same issue because there is only a single key checked out from the cache to compare to the encrypted public keys stored in the _ciphertext.

__lenResult = plaintextLen;
MemoryBuffer resultBuffer(__lenResult);
resultBuffer.append(__lenResult, plaintext.bufferBase());
__result = resultBuffer.detachOwn();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you directly return from plaintext.detachOwn() and avoid a clone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and removed clone MemoryBuffer.

cache.pop_back();
}
}
else
failOpenSSLError("loading a pkey");

return pkey;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be clearly documented as a comment on the class. The cached results cannot be relied on if it is called from multiple threads.

{
if(PRINT_STATS)
{
pkeyCache.printStatistics();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only going to work for hthor - because the caches are threaded local. Even then there are situations where it will display nothing.

@dcamper
Copy link
Contributor

dcamper commented Jan 2, 2025

A few comments about the thread safety of the caches. Something to discuss with Dan whether it is best to have per thread caches, or one shared cache. If they are per thread, then the initialisation all needs to happen in the constructor, and clean up in the destructor. If one shared, then the returned object's lifetime needs more thought.

I had originally suggested that Jack implement thread-local caches. This is one of the (few) cases where I think the overhead of thread-safety via locks outweighs the savings of globally caching constructed objects. Considerations included: cached objects are usually quite small; there will typically be very few objects in any particular run (there won't be much switching between algorithms or keys during a job, for instance); and crypto calls will generally be applied en masse to large data in a TRANSFORM (as opposed to one-off calls at the top level of a BWR).

@jackdelv
Copy link
Contributor Author

jackdelv commented Jan 3, 2025

@ghalliday I had a couple questions that I left for you in the comments. Back to you.

@jackdelv jackdelv requested a review from ghalliday January 3, 2025 17:28
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackdelv looks good. Please can you squash ready for merging.

}
}

MODULE_INIT(INIT_PRIORITY_STANDARD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trivial: could now be deleted.

@ghalliday ghalliday closed this Jan 8, 2025
@ghalliday ghalliday reopened this Jan 8, 2025
Copy link

github-actions bot commented Jan 8, 2025

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33018

Jirabot Action Result:
Workflow Transition To: Merge Pending

Signed-off-by: Jack Del Vecchio <jack.delvecchio@lexisnexisrisk.com>
@jackdelv
Copy link
Contributor Author

jackdelv commented Jan 8, 2025

@ghalliday Squashed.

@ghalliday ghalliday merged commit 908fd65 into hpcc-systems:master Jan 10, 2025
50 of 53 checks passed
Copy link

Jirabot Action Result:
Added fix version: 9.10.0
Workflow Transition: 'Resolve issue'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants