Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an ABI extension to define metadata for binary analysis #297

Open
smithp35 opened this issue Nov 13, 2024 · 7 comments
Open

Consider an ABI extension to define metadata for binary analysis #297

smithp35 opened this issue Nov 13, 2024 · 7 comments

Comments

@smithp35
Copy link
Contributor

With increasing adoption of tools like BOLT and use of binary analysis by the Linux kernel, there may be demand for additional metadata to aid control flow discovery.

Examples include:

  • Identifying static linker generated stubs/veneers/thunks.
  • Identifying function pointer destinations.

If there are to be metadata added to toolchains such as LLVM and GCC, it would be useful to document these in the ABI to help with interoperability of tools.

This issue is a placeholder for further discussion.

@kbeyls
Copy link
Contributor

kbeyls commented Nov 14, 2024

Other useful pieces of info for binary analysis reconstruction of control flow graphs include:

@peterwaller-arm
Copy link
Contributor

peterwaller-arm commented Nov 14, 2024

  • [BOLT] GOT array pointer incorrectly rewritten llvm/llvm-project#100096
    it's tricky in general to determine if a GOT entry points to data or code, and there exists at least one case where they can alias through pointer-to-end-of-array. This can probably be solved well enough heuristically for the specific case seen there (glibc bfd linked static binary crash on startup); but I wonder if the problem could exist more generally.

@maksfb
Copy link

maksfb commented Nov 14, 2024

  • More architectures are adding jump table information into ELF (Jump table annotations for Linux llvm/llvm-project#112606). It will be great to have a standard ELF extension that covers them all. From the BOLT perspective, we need to know jump table boundaries, instructions involved in forming the jump table address, and the indirect jump location(s). Note that it's possible for jump tables to overlap. Additionally, we can expect the jump table address to be stored and loaded from the stack.
  • Indirect/computed goto extension for C/C++ also uses indirect jumps and the mechanism deviates from a typical switch/jump table implementation.

@alekuz01
Copy link

alekuz01 commented Nov 18, 2024

One of the extension that could be potentially utilized now to identify BB and Funcs from LLVM side, could be Basic Block Address Map:

https://llvm.org/docs/Extensions.html#sht-llvm-bb-addr-map-section-basic-block-address-map

The basic block address map was used in a similar to BOLT(compiler/linker level but non for binary level) tooling to let correctly map profiled sampled information related to Funcs/BBs.
The format of this map and data incorporated into binary should not have a significant impact in terms of the size or perf.
And it could be used as a hint as well. I have been thinking to make it work for the BOLT if the section '.llvm_bb_addr_map' is presented in the binary.

The disadvantage - this BBAddrMap is presented only for LLVM/CLANG.

Regarding jump tables, maybe considering something like: https://llvm.org/docs/Extensions.html#sht-llvm-jt-sizes-section-jump-table-addresses-and-sizes

@ilinpv
Copy link

ilinpv commented Nov 18, 2024

A slightly different topic, but it seems also related to the ABI-like agreement on BOLT binary rewriting and stripping tools expectations. Issues reported on that:
llvm/llvm-project#56738
llvm/llvm-project#89336
llvm/llvm-project#85796
The RFC effort to address some issues with new sections in the end and old sections in place confusing stripping tools https://discourse.llvm.org/t/bolt-rfc-a-new-mode-to-rewrite-entire-binary/68674

@smithp35
Copy link
Contributor Author

Thank you for all the comments and suggestions. It looks like there is sufficient interest to go forward with this. Most likely in the form of an ABI extension that can be worked on incrementally with an implementation. We'll have more to say next year.

@appujee
Copy link

appujee commented Dec 11, 2024

Another reference we can take ideas from: google/android-riscv64#68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants