This repository contains the companion compiler/analyzer for the orbit abstractions of the Obi-wan project. The tool analyzes a user-level program and instruments the program to optimize the usability and performance of orbit tasks for developers. The compiler is built on top of the LLVM framework.
- LLVM 5.0.x
mkdir build
cd build
cmake ..
make -j4
We include a set of simple test programs for testing the orbit compiler during
development. For them to be used by the tool, we need to compile these test
programs into bitcode files using clang
, e.g.,
clang -c -emit-llvm test/loop1.c -o test/loop1.bc
For convenience, you can compile the bitcode files for all test programs with:
cd test
make
Invoke the analysis either using the LLVM opt
tool with the analysis library in
the lib
directory or using the executable in the tools
directory.
cd build
opt -load lib/libLLVMDefUse.so -mydefuse < ../test/loop1.bc > /dev/null
For large software, we need to modify its build system to use clang for compilation. The most systematic way is using the WLLVM wrapper.
$ pip install wllvm
The basic idea is to replace the regular C/C++ compiler call (e.g., gcc
/g++
)
with wllvm
/wllvm++
wrapper call, which will take care of the details for using
LLVM and clang to produce the bitcode.
Large systems usually use Makefile or CMake as the build system. In these cases,
it is fairly simple: define environment variables CC
(and CXX
) to be wllvm
(and wllvm++
), without changing the Makefiles. The compilation will be
transparent. In the end, invoke the extract-bc
command on the built
executable, which will produce the bitcode file.
Below is an example of compiling MySQL for analysis.
A. Download MySQL source code:
Replace 5.5.59 with 5.7.31 for MySQL 5.7.31
$ mkdir -p target-sys/mysql-build
$ cd target-sys
$ wget -nc https://downloads.mysql.com/archives/mysql-5.5/mysql-5.5.59.tar.gz
$ tar xzvf mysql-5.5.59.tar.gz
B. Compile with wllvm
:
You also may need to install the Boost libraries and then include it with the -DWITH_BOOST=<directory>
flag.
$ cd mysql-build
$ export LLVM_COMPILER=clang
$ CC=wllvm CXX=wllvm++ cmake ../mysql-5.5.59 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_FLAGS_DEBUG="-g -O0 -fno-inline-functions" -DCMAKE_CXX_FLAGS_DEBUG="-g -O0 -fno-inline-functions" -DMYSQL_MAINTAINER_MODE=false
$ make -j$(nproc)
$ extract-bc sql/mysqld
You should see a mysqld.bc
in the sql
directory (where the normal mysqld
executable resides). This bitcode file will be the target file for analysis
and instrumentation.
$ cd test && make
$ cd ../build
$ ../scripts/instrument-compile.sh --printf ../test/alloc.bc
This will instrument all the malloc
calls in the alloc.c
test program
and insert a printf
to output the malloc size and return pointer address.
The resulted instrumented executable is alloc-instrumented
, which can be
directly executed.
$ ./alloc-instrumented
orbit alloc: 16 => 0x12f1260
orbit alloc: 16 => 0x12f1690
orbit alloc: 16 => 0x12f16b0
orbit alloc: 4 => 0x12f16d0
foo(5)=30
orbit alloc: 16 => 0x12f16b0
orbit alloc: 16 => 0x12f1690
orbit alloc: 16 => 0x12f1260
orbit alloc: 4 => 0x12f16d0
foo(15)=90
orbit alloc: 16 => 0x12f1260
orbit alloc: 16 => 0x12f1690
orbit alloc: 16 => 0x12f16b0
orbit alloc: 4 => 0x12f16d0
foo(20)=120
A more complex instrumentation will involve instrumenting calls to a runtime
library (OrbitTracker
) in runtime
directory. This allows decoupling of the
instrumented logic to the library. Otherwise, doing everything in raw LLVM
IR is tedious and error-prone.
$ ../scripts/instrument-compile.sh ../test/alloc.bc
This will also produce alloc-instrumented
. But the difference is that
this instrumented binary will call our custom tracking function void __orbit_track_gobj(char *addr, size_t size)
in the runtime library (instead of simple printf
), which is linked with the executable.
Now try running this instrumented binary:
$ ./alloc-instrumented
opening orbit tracker output file orbit_gobj_pid_985.dat
foo(5)=30
foo(15)=90
foo(20)=120
$ cat orbit_gobj_pid_985.dat
16 => 0x15b9490
16 => 0x15ba4c0
16 => 0x15ba4e0
4 => 0x15ba500
16 => 0x15ba4e0
16 => 0x15ba4c0
16 => 0x15b9490
4 => 0x15ba500
16 => 0x15b9490
16 => 0x15ba4c0
16 => 0x15ba4e0
4 => 0x15ba500
As shown above, the runtime tracking library saves the information to a file
as the program runs. Note that the last part of the trace file is PID (orbit_gobj_pid_xxx.dat
),
which will change in different runs.
The output of the LLVM pass is a list of heap allocation functions that can reach the target function (check_and_resolve
) along with the path taken
$ opt -load lib/libObiWanAnalysisPass.so -obi-wan-analysis -target-functions DeadlockChecker::check_and_resolve < ../target-sys/mysql-build/sql/mysqld.bc > /dev/null
$ clang test-instrumented.bc -o test-instrumented -L /home/ubuntu/orbit-compiler-temp/build/runtime -l:libOrbitTracker.a -lstdc++
$ ./test-instrumented
- The target function
check_and_resolve
is provided as a user input. Ideally, the developer can specify the target through the use of the attributeannotate
. There is some preprocessing required, however, before this annotation can be read directly in LLVM. This preprocessing is already performed in the functionaddFunctionAttributes
inObiWanAnalysisPass
. - Similarly, the heap allocation functions are currently manually specified. A better approach would be to use the
annotate
attribute with a different string. - Class Member Analysis: In certain cases, we have to chase the users of class member variables. This is more involved since LLVM does not support this out of the box. One approach this analysis takes is based on the fact that LLVM implements classes as structs. Struct variables are accessed via the
getElementPtr
instruction. Access to the same struct variables means that thegetElementPtr
instructions are similar (they may not be identical since the base address may not be the same). This is reflected in the functionisAccessingSameStructVar
in LLVM.cpp. This is currently only enabled for identifying thetrx_t
variable heap point since this extended analysis may be incorrect in some cases. I have not tested this extensively and it may produce incorrect results. - In certain cases, the path cannot be printed out even if the analysis discovers a path. This is slightly tricky since there may be sub problems to tackle
- Global Variables: Global variables have users which span across functions and thus the current directed graph approach to find the path would not work
- Class Member Analysis: Currently class member analysis is performed by identifying access to the same class fields via the
getElementPtr
instruction. However, this analysis does not work for the directed graph search
I referred to the following for understanding and implementation:
- https://releases.llvm.org/5.0.1/docs/LangRef.html
- https://releases.llvm.org/5.0.1/docs/ProgrammersManual.html
- https://mapping-high-level-constructs-to-llvm-ir.readthedocs.io/en/latest/README.html#
- https://llvm.org/docs/GetElementPtr.html
- https://blog.yossarian.net/2020/09/19/LLVMs-getelementptr-by-example
- LLVM 5.0 Doxygen: This is not hosted but it can be downloaded from https://releases.llvm.org/download.html
Please refer to the code style for the coding convention and practice.