This document specifies the format for a crypto-auditing primary event logs.
To meet the practical use-cases, the proposed format is designed to be able to capture events at multiple abstraction levels: from invocations of low-level cryptographic primitives to TLS session establishment.
- Contextual format: the logging format should be able to represent both high-level and low-level events, grouped by contexts
- Log truncation tolerance: the log file can be truncated at arbitrary record boundary
- Human-readable format: there will be separate processes consuming the log format and converting it to application-specific representation
The general structure of the logging format is a stream of structured event entries.
Events are classified into two categories: context mapping event and data event. The former maintains contexts associated with the latter. The latter represents events themselves in a form of key-value pairs.
A TLS handshake consists of several cryptographic operations, such as digital-signature verification and key derivation. For simplicity, let's assume only digital-signature operations are involved.
There will be two contexts: an entire TLS handshake, and digital-signature verification or creation, each of them is associated to data events which describe the detail of those events, e.g., TLS protocol version and digital-signature algorithm used.
Contexts are identified by unique 16-byte values, which are included in any kind of events. If the events were represented as a JSON array, the log file would conceptually look like the following, though a more efficient (binary) format will be used in practical deployment.
[
{
"type": "new_context",
"context": "00..01", // start of context 00..01
"parent": "00..00"
},
{
"type": "string_data",
"context": "00..01",
"name": "tls::handshake_client"
},
{
"type": "word_data",
"context": "00..01",
"tls::protocol_version": 0x0304
},
{
"type": "new_context",
"context": "00..02", // start of context 00..02
"parent": "00..01"
},
{
"type": "string_data",
"context": "00..02",
"name": "tls::certificate_verify"
},
{
"type": "word_data",
"context": "00..02",
"tls::signature_algorithm": 0x0804 // rsa_pss_rsae_sha256
},
{
"type": "word_data",
"context": "00..02",
"pk::bits": 3072,
}
]
This can be conceptually represented as a tree of events:
tls::handshake_client
(00..01)tls::protocol_version
= 0x0304tls::certificate_verify
(00..02)tls::signature_algorithm
= 0x0804pk::bits
= 3072
Since the agent can monitor multiple processes, event sequences could be interleaved with each other. In that situation, context IDs help to recover the original event sequences.
For security and privacy reasons, context ID should be constructed to be indistinguishable from the internal state of target programs, e.g., PID or memory address, while those information could be used as an input to the construction algorithm. The recommended way of constructing context ID as follows:
- The agent initializes an encryption key used with AES-ECB at startup
- An 8-byte context ID and an 8-byte PID/TGID of the target program are concatenated to construct a 16-byte input (i.e., a single block of AES-ECB)
- Encrypt the 16-byte input with AES-ECB using the key created above
This is inspired by the similar mechanism to record number encryption in QUIC and DTLS 1.3 protocols. With the AES-NI instruction set enabled, this procedure consumes up to 15 cycles.
The agent may periodically rotate the key.
When multiple events are sent within a single context, the same context IDs are written into the log file, which could unnecessarily consume disk space. Therefore, the log format supports compression of subsequent events that share the same context ID, given a certain time window. With the compression enabled, the above example would look like the following, preserving the same semantics:
[
{
"context": "00..01", // start of context 00..01
"events": [
{
"type": "new_context",
"parent": "00..00"
},
{
"type": "string_data",
"name": "tls::handshake_client"
},
{
"type": "word_data",
"tls::protocol_version": 0x0304
}
]
},
{
"context": "00..02", // start of context 00..02
"events": [
{
"type": "new_context",
"parent": "00..01"
},
{
"type": "string_data",
"name": "tls::certificate_verify"
},
{
"type": "word_data",
"tls::signature_algorithm": 0x0804 // rsa_pss_rsae_sha256
},
{
"type": "word_data",
"pk::rsa_size": 3072
}
]
}
]
While the keys can be arbitrary, this section provides a guidance on
how to construct them. There are two types of keys: generic keys and
scoped keys. Generic keys consists of only alphanumeric characters
and an underscore, while scoped keys can have a prefix ending with
"::". In the previous example, name
is a generic key, while
tls::protocol_version
is a scoped key. More strictly, they are
written in ABNF as follows:
name = ALPHA *(ALPHA / DIGIT / "_")
generic_key = name
scoped_key = name "::" name
Keys are also used to determine value types. For example, name
can
take a string value, while tls::protocol_version
takes a 16-bit
integer that corresponds to [ProtocolVersion
][protocol-version]
(i.e., a 16-bit integer) in TLS.
The registry of those key names should be maintained in a separate document. The following section defines a few generic probe points and TLS probe points.
key | value type | description |
---|---|---|
name |
string | the name of current context (available names are defined below) |
name | description |
---|---|
tls::handshake_client |
TLS handshake for client |
tls::handshake_server |
TLS handshake for server |
tls::certificate_sign |
Digital signature is created using certificate in TLS handshake |
tls::certificate_verify |
Digital signature is verified using certificate in TLS handshake |
tls::key_exchange |
Shared secret derivation in TLS handshake |
key | value type | description |
---|---|---|
tls::protocol_version |
uint16 | Negotiated TLS version |
tls::ciphersuite |
uint16 | Negotiated ciphersuite (as in IANA registry) |
tls::signature_algorithm |
uint16 | Signature algorithm used in the handshake (as in IANA registry) |
tls::key_exchange_algorithm |
uint16 | Key exchange mode: ECDHE(0), DHE(1), PSK(2), ECDHE-PSK(3), DHE-PSK(4) |
tls::group |
uint16 | Groups used in the handshake (as in IANA registry) |
tls::ext::extended_master_secret |
word (ignored) | Present when extended_master_secret extension is negotiated |
name | description |
---|---|
ssh::handshake_client |
SSH handshake for client |
ssh::handshake_server |
SSH handshake for server |
ssh::client_key |
SSH client key signature/verification |
ssh::server_key |
SSH server key signature/verification |
ssh::key_exchange |
SSH key exchange |
All the keys except rsa_bits
have string
type.
We distinguish server and client values by the context we are in. We log all relevant events in both contexts.
key | description | example |
---|---|---|
ssh::ident_string |
Software identification string | SSH-2.0-OpenSSH_8.8 |
ssh::peer_ident_string |
Peer software identification string | SSH-2.0-OpenSSH_8.8 |
ssh::key_algorithm |
Key used in handshake/key ownership proof | ssh-ed25519 |
ssh::rsa_bits |
Key bits (RSA only) | 2048 |
ssh::cert_signature_algorithm |
If cert is used, signature algorithm of the cert | ecdsa-sha2-nistp521 |
ssh::kex_algorithm |
Negotiated key exchange algorithm | curve25519-sha256 |
ssh::kex_group |
Group used for key exchange | moduli+bits or group name. |
ssh::c2s_cipher |
Data cipher algorithm | aes256-gcm@openssh.com |
ssh::s2c_cipher |
||
ssh::c2s_mac |
Data integrity algorithm, omitted for implicit |
umac-128-etm@openssh.com |
ssh::s2c_mac |
||
ssh::c2s_compression |
Data compression algorithm, omitted for none |
zlib@openssh.com |
ssh::s2c_compression |
ssh::handshake_client
ssh::ident_string
=SSH-2.0-OpenSSH_8.8
ssh::peer_ident_string
=SSH-2.0-OpenSSH_8.8
ssh::key_exchange
ssh::kex_algorithm
=curve25519-sha256
ssh::key_algorithm
=ssh-ed25519
ssh::s2c_cipher
=aes256-gcm@openssh.com
ssh::c2s_cipher
=aes256-gcm@openssh.com
ssh::server_key
ssh::key_algorithm
=ssh-ed25519
ssh::client_key
ssh::key_algorithm
=ssh-ed25519
ssh::server_key
ssh::key_algorithm
=rsa-sha2-256
ssh::rsa_bits
= 2048
ssh::server_key
ssh::key_algorithm
=ecdsa-sha2-nistp256
These contexts are only useful when a public key operation cannot be
determined from the outer context. If it is obvious from the outer
context, the probe point provider may choose to not create a new
context. For example, when the parent context is
tls::certificate_verify
, there is no need to create a new context
with pk::verify
.
name | description |
---|---|
pk::sign |
A digital signature is created |
pk::verify |
A digital signature is verified |
pk::encrypt |
Encryption is performed |
pk::decrypt |
Decryption is performed |
pk::encapsulate |
A session key is encapsulated |
pk::decapsulate |
A session key is decpasulated |
pk::generate |
A private key is generated |
pk::derive |
A shared secret is generated |
The event keys defined here can be attached to any context, not
limited to the pk
contexts defined above.
These event keys are only useful when public key algorithm parameters
cannot be determined from the outer context. If all the parameters are
obvious from the outer context, the probe point provider may choose to
not emit the pk
events. For example, when the parent context has
tls::signature_algorithm
, there is no need to emit pk::algorithm
.
All the keys except pk::static
have string
type. The values can be
arbitrary and it is a responsibility of the data consumers to
correlate them.
key | value type | description |
---|---|---|
pk::algorithm |
string | Used algorithm name |
pk::curve |
string | Elliptic curve name |
pk::group |
string | FFDH group name |
pk::bits |
uint16 | Key strength in bits |
pk::hash |
string | Hash algorithm used for signing or encryption (for prehashed or parametrized schemes such as ECDSA, RSA-PSS, and RSA-OAEP) |
pk::static |
word (ignored) | Present when pk::derive takes place with reused keys |
The recommended format of storing events is to use a sequence of CBOR (Concise Binary Object Representation) objects. The following is the formal definition in CDDL (Concise Data Definition Language):
LogEntry = EventGroup
EventGroup = {
context: ContextID
start: time
end: time
events: [+ Event]
}
Event = NewContext / Data
ContextID = bstr .size 16
NewContext = {
NewContext: {
parent: ContextID
}
}
Data = {
Data: {
key: tstr
value: uint .size 8 / tstr / bstr
}
}
The log consists of a series of EventGroup
objects, which groups
events in given time window from start
to end
. Timestamps are
represented as a monotonic duration from the kernel boot time.
ContextID
is an encrypted 16-byte context.
- How are algorithm identifiers represented in the log format and the protocol? String representation would require memory allocation at the BPF level, which might not be ideal. Integer representation would impose translation to the consumer components in the later pipeline. In both cases we need a registry to standardize known algorithm identifiers.
- Distributed tracing in microservices is a pattern that makes it easy to track end-to-end requests, by associating contexts to durations ("spans") of each service processing the requests (explainer blog article, another blog article)
- KEP-3077: contextual logging is a proposal to add contextual logging to Kubernetes, by allowing the context to be swapped
- The SSLKEYLOGFILE Format which uses 32 byte value of the Random field from the ClientHello message to distinguish TLS connections