Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non portable use of pthread_t #563

Merged
merged 6 commits into from
Jan 3, 2020

Conversation

nchong-at-aws
Copy link
Contributor

Issue #, if available: #562

Description of changes:

Add platform-specific typedef aws_thread_id and portable printing/comparison functions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

include/aws/common/logging.h Show resolved Hide resolved
include/aws/common/thread.h Outdated Show resolved Hide resolved
include/aws/common/thread.h Outdated Show resolved Hide resolved
source/logging.c Outdated Show resolved Hide resolved
source/posix/thread.c Outdated Show resolved Hide resolved
ASSERT_INT_EQUALS(
test_data.thread_id,
aws_thread_get_id(&thread),
ASSERT_TRUE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have an assert false test here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, we would need to manufacture an aws_thread_id value that we know is not the current thread. One way would be to create a new thread and then check against that.

};

static void s_error_thread_test_thread_local_cb(int err, void *ctx) {
struct error_thread_test_data *cb_data = (struct error_thread_test_data *)ctx;

uint64_t thread_id = aws_thread_current_thread_id();
aws_thread_id thread_id = aws_thread_current_thread_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used again? If not, maybe move it inside the test?

uint64_t current_thread_id = aws_thread_current_thread_id();
aws_thread_id current_thread_id = aws_thread_current_thread_id();
char repr[AWS_THREAD_ID_REPR_LEN];
if (aws_thread_id_to_string(current_thread_id, repr, AWS_THREAD_ID_REPR_LEN)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger is pretty slow naturally, but I'm still a little worried about potential overhead here for Trace situations. The value is const, so what about adding the thread_id's string value as a member of aws_thread and a getter for that value rather than doing the to-string loop with every log call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see... except, I think this wouldn't work for logging in the main procedure since that isn't tied to an aws_thread instance. Instead, I made a thread-local copy of the string representation so the cost is at most once per thread.

/* Thread-local string representation of current thread id */
AWS_THREAD_LOCAL struct {
bool is_valid;
char repr[AWS_THREAD_ID_REPR_LEN];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you also store the actual thread_id here, then this becomes the thing you take the address of in aws-c-io.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, but would couple together c-io event loops and c common logging. Instead, how about changing the API for getting the current thread id to return a pointer to a thread-local aws_thread_id variable? The string repr can still be kept in logging, which is the only place it is needed.

// in posix/thread.c
AWS_THREAD_LOCAL struct {
    bool is_valid;
    aws_thread_id thread_id;
} tl_current_thread = {.is_valid = false};

const aws_thread_id *aws_thread_current_thread_id(void) {
    if (!tl_current_thread.is_valid) {
        tl_current_thread.thread_id = pthread_self();
        tl_current_thread.is_valid = true;
    }
    return &tl_current_thread.thread_id;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. "Many systems impose restrictions on the size of the thread-local memory block, in fact often rather tight limits." wikipedia. So it's not a great idea to make a bunch of thread-local storage variables, each of which caches one tiny thing.

  2. Is there a chance of the us trying to query the tl_current_thread after the thread is gone? I know the logger usually uses one logging thread. Other threads add their statements to a queue, and the logging thread drains the queue. If a thread logs something right before it exits, then the logging thread is going to try to get its name a little bit later and it will already be gone.

@danielsn
Copy link
Contributor

danielsn commented Dec 21, 2019 via email

include/aws/common/thread.h Outdated Show resolved Hide resolved
/* Thread-local string representation of current thread id */
AWS_THREAD_LOCAL struct {
bool is_valid;
char repr[AWS_THREAD_ID_REPR_LEN];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. "Many systems impose restrictions on the size of the thread-local memory block, in fact often rather tight limits." wikipedia. So it's not a great idea to make a bunch of thread-local storage variables, each of which caches one tiny thing.

  2. Is there a chance of the us trying to query the tl_current_thread after the thread is gone? I know the logger usually uses one logging thread. Other threads add their statements to a queue, and the logging thread drains the queue. If a thread logs something right before it exits, then the logging thread is going to try to get its name a little bit later and it will already be gone.

source/logging.c Outdated Show resolved Hide resolved
source/logging.c Outdated Show resolved Hide resolved
include/aws/common/thread.h Outdated Show resolved Hide resolved
Copy link
Contributor

@graebm graebm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm questioning whether aws_thread_id_t should be a custom type.
Facts:

  • It was originally uint64_t because that was big enough on all platforms. Not a bad design.
  • BUT aws/common/atomics.h only currently operates on size_t and void *.
  • AND there was seemingly bad code in aws-c-io that stored the uint64_t in a size_t atomic.
    • In theory, this won't work on 32bit platforms because size_t < uint64_t
    • However, if we look at the platform-specific thread-ids, they happen to be 32bit on 32bit platforms and 64bit on 64bit platforms.
    • So this certainly looks broken, but if you spend a long time looking into it you eventually realize that it actually happens to work. But we should fix it because this is awful confusing hard-to-follow garbage.

So:
What if, instead of declaring aws_thread_id_t, we just used void * or size_t as the thread-id type? The advantage of doing this is:

  1. our existing atomic operations can handle it
  2. printf("%p" or "%zu") can handle it, so no need for a to_string function

Disadvantage would be if we ever discovered a platform whose thread-id was larger than its pointer type. But that seems ... unlikely, right???. I guess posix doesn't explicitly state that pthread_t even be a scalar type, but we assumed it was when we made it uint64_t. On a theoretical system where these rules are broken, would could always have a thread-local variable and return the pointer to that.

Advantage of void * over size_t is that NULL looks like an invalid thread-id. BUT it's apparently not explicitly stated that pthread_t==0 means "invalid thread-id". (It is on Darwin, because it's a pointer under the hood) (and it is on Windows 0 is never valid)

So I guess I'd mildly favor size_t over void *, and just not have a concept of "invalid thread id"

Copy link
Contributor

@graebm graebm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind my essay there.
The motivation is to have users use an explicit compare function.
Using size_t/void* is not going to help with that.

source/logging.c Outdated Show resolved Hide resolved
@nchong-at-aws
Copy link
Contributor Author

Thanks all. Changes to downstream dependencies: I think only aws-c-io needs updating. I did a quick check (grep for thread) of aws-checksums, aws-c-event-stream, aws-c-compression, aws-c-cal, aws-c-mqtt, aws-c-http and aws-c-auth.

@nchong-at-aws nchong-at-aws merged commit 6bda6f9 into awslabs:master Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants