Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not skip init() in realloc() #220

Merged
merged 1 commit into from
Sep 26, 2023
Merged

do not skip init() in realloc() #220

merged 1 commit into from
Sep 26, 2023

Conversation

cgzones
Copy link
Contributor

@cgzones cgzones commented Sep 26, 2023

If N_ARENA is greater than 1 thread_arena is initially to N_ARENA, which is an invalid index into ro.size_class_metadata[].

The actual used arena is computed in init().

Ensure init() is called if a new thread is only using realloc() to avoid UB, e.g. pthread_mutex_lock() might crash due the memory not holding an initialized mutex.

Affects mesa 23.2.0~rc4.

Example back trace using glmark2 (note arena=4 with the default N_ARENA being 4):

Program terminated with signal SIGSEGV, Segmentation fault.
#0  ___pthread_mutex_lock (mutex=0x7edff8d3f200) at ./nptl/pthread_mutex_lock.c:80
        type = <optimized out>
        __PRETTY_FUNCTION__ = "___pthread_mutex_lock"
        id = <optimized out>
#1  0x00007f0ab62091a6 in mutex_lock (m=0x7edff8d3f200) at ./mutex.h:21
No locals.
#2  0x00007f0ab620c9b5 in allocate_small (arena=4, requested_size=24) at h_malloc.c:517
        info = {size = 32, class = 2}
        size = 32
        c = 0x7edff8d3f200
        slots = 128
        slab_size = 4096
        metadata = 0x0
        slot = 0
        slab = 0x0
        p = 0x0
#3  0x00007f0ab6209809 in allocate (arena=4, size=24) at h_malloc.c:1252
No locals.
#4  0x00007f0ab6208e26 in realloc (old=0x72b138199120, size=24) at h_malloc.c:1499
        vma_merging_reliable = false
        old_size = 16
        new = 0x0
        copy_size = 139683981990973
#5  0x00007299f919e556 in attach_shader (ctx=0x7299e9ef9000, shProg=0x7370c9277d30, sh=0x7370c9278230) at ../src/mesa/main/shaderapi.c:336
        n = 1
#6  0x00007299f904223e in _mesa_unmarshal_AttachShader (ctx=<optimized out>, cmd=<optimized out>) at src/mapi/glapi/gen/marshal_generated2.c:1539
        program = <optimized out>
        shader = <optimized out>
        cmd_size = 2
#7  0x00007299f8f2e3b2 in glthread_unmarshal_batch (job=job@entry=0x7299e9ef9168, gdata=gdata@entry=0x0, thread_index=thread_index@entry=0) at ../src/mesa/main/glthread.c:139
        cmd = 0x7299e9ef9180
        batch = 0x7299e9ef9168
        ctx = 0x7299e9ef9000
        pos = 0
        used = 3
        buffer = 0x7299e9ef9180
        shared = <optimized out>
        lock_mutexes = <optimized out>
        batch_index = <optimized out>
#8  0x00007299f8ecc2d9 in util_queue_thread_func (input=input@entry=0x72c1160e5580) at ../src/util/u_queue.c:309
        job = {job = 0x7299e9ef9168, global_data = 0x0, job_size = 0, fence = 0x7299e9ef9168, execute = <optimized out>, cleanup = <optimized out>}
        queue = 0x7299e9ef9058
        thread_index = 0
#9  0x00007299f8f1bcbb in impl_thrd_routine (p=<optimized out>) at ../src/c11/impl/threads_posix.c:67
        pack = {func = 0x7299f8ecc190 <util_queue_thread_func>, arg = 0x72c1160e5580}
#10 0x00007f0ab5aa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139683974242608, 2767510063778797177, -168, 11, 140727286820160, 126005371879424, -4369625917767903623, -2847048016936659335}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
          0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#11 0x00007f0ab5b26a2c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

If N_ARENA is greater than 1 `thread_arena` is initially to N_ARENA,
which is an invalid index into `ro.size_class_metadata[]`.

The actual used arena is computed in init().

Ensure init() is called if a new thread is only using realloc() to avoid
UB, e.g. pthread_mutex_lock() might crash due the memory not holding an
initialized mutex.

Affects mesa 23.2.0~rc4.

Example back trace using glmark2 (note `arena=4` with the default
N_ARENA being 4):

    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  ___pthread_mutex_lock (mutex=0x7edff8d3f200) at ./nptl/pthread_mutex_lock.c:80
            type = <optimized out>
            __PRETTY_FUNCTION__ = "___pthread_mutex_lock"
            id = <optimized out>
    GrapheneOS#1  0x00007f0ab62091a6 in mutex_lock (m=0x7edff8d3f200) at ./mutex.h:21
    No locals.
    GrapheneOS#2  0x00007f0ab620c9b5 in allocate_small (arena=4, requested_size=24) at h_malloc.c:517
            info = {size = 32, class = 2}
            size = 32
            c = 0x7edff8d3f200
            slots = 128
            slab_size = 4096
            metadata = 0x0
            slot = 0
            slab = 0x0
            p = 0x0
    GrapheneOS#3  0x00007f0ab6209809 in allocate (arena=4, size=24) at h_malloc.c:1252
    No locals.
    GrapheneOS#4  0x00007f0ab6208e26 in realloc (old=0x72b138199120, size=24) at h_malloc.c:1499
            vma_merging_reliable = false
            old_size = 16
            new = 0x0
            copy_size = 139683981990973
    GrapheneOS#5  0x00007299f919e556 in attach_shader (ctx=0x7299e9ef9000, shProg=0x7370c9277d30, sh=0x7370c9278230) at ../src/mesa/main/shaderapi.c:336
            n = 1
    GrapheneOS#6  0x00007299f904223e in _mesa_unmarshal_AttachShader (ctx=<optimized out>, cmd=<optimized out>) at src/mapi/glapi/gen/marshal_generated2.c:1539
            program = <optimized out>
            shader = <optimized out>
            cmd_size = 2
    GrapheneOS#7  0x00007299f8f2e3b2 in glthread_unmarshal_batch (job=job@entry=0x7299e9ef9168, gdata=gdata@entry=0x0, thread_index=thread_index@entry=0) at ../src/mesa/main/glthread.c:139
            cmd = 0x7299e9ef9180
            batch = 0x7299e9ef9168
            ctx = 0x7299e9ef9000
            pos = 0
            used = 3
            buffer = 0x7299e9ef9180
            shared = <optimized out>
            lock_mutexes = <optimized out>
            batch_index = <optimized out>
    GrapheneOS#8  0x00007299f8ecc2d9 in util_queue_thread_func (input=input@entry=0x72c1160e5580) at ../src/util/u_queue.c:309
            job = {job = 0x7299e9ef9168, global_data = 0x0, job_size = 0, fence = 0x7299e9ef9168, execute = <optimized out>, cleanup = <optimized out>}
            queue = 0x7299e9ef9058
            thread_index = 0
    GrapheneOS#9  0x00007299f8f1bcbb in impl_thrd_routine (p=<optimized out>) at ../src/c11/impl/threads_posix.c:67
            pack = {func = 0x7299f8ecc190 <util_queue_thread_func>, arg = 0x72c1160e5580}
    GrapheneOS#10 0x00007f0ab5aa63ec in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
            ret = <optimized out>
            pd = <optimized out>
            out = <optimized out>
            unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139683974242608, 2767510063778797177, -168, 11, 140727286820160, 126005371879424, -4369625917767903623, -2847048016936659335}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0,
              0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
            not_first_call = <optimized out>
    GrapheneOS#11 0x00007f0ab5b26a2c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
@thestinger thestinger closed this Sep 26, 2023
@thestinger thestinger reopened this Sep 26, 2023
@thestinger thestinger merged commit ede3179 into GrapheneOS:main Sep 26, 2023
8 checks passed
@thestinger
Copy link
Member

thestinger commented Sep 26, 2023

Thanks. I initially misunderstood the issue you were fixing. This happens in a case where hardened_malloc is already globally initialized, the process obtains a small (slab) allocation from it and then calls realloc on that in a thread which hasn't made an allocation with malloc yet.

This was overlooked when adding support for arenas and GrapheneOS currently doesn't use arenas with hardened_malloc since it uses too much memory for our usage when slab allocation quarantines are enabled. It uses a per-arena-per-size-class quarantine and sets the quarantine size based on the largest slab allocation size class so having extended size classes enabled greatly increases the memory dedicated to quarantines. Providing more configuration for slab allocation size classes and potentially optimizing their substantial impact on performance and memory usage is becoming a priority.

@thestinger
Copy link
Member

We'll tag a new release of the standalone hardened_malloc soon to get the fix to people using the stable releases. Would have done it already but things are not going particularly well in terms of the ongoing harassment, fabricated stories, swatting attacks, etc. and it's very difficult to get basic things done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants