[SERVE][CPP][Android] add native executable program to benchmark models #2987
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
I have modified and crafted some code to run LLM in adb shell or linux shell via MLC-LLM (btw. great appreciate to authors and contributors) as a binary executable program.
I'm not an expert in C++, so the code isn't perfect(actually it is tinkered and glued outputs of ChatGPT, Claude and my dog), but I think it's easy to read, understand and run.
How to setup:
0. setup MLC-LLM and virtualenv (install dependencies, TVM, etc. etc.)
build-aarch64-opencl
. Run all following commands from this dir.cmake \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_TOOLCHAIN_FILE=/home/piotr/android/sdk/ndk/26.1.10909125/build/cmake/android.toolchain.cmake \ -DCMAKE_INSTALL_PREFIX=. \ -DCMAKE_CXX_FLAGS="-O3" \ -DANDROID_ABI=arm64-v8a \ -DANDROID_NATIVE_API_LEVEL=android-31 \ -DANDROID_PLATFORM=android-31 \ -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON \ -DANDROID_STL=c++_static \ -DUSE_HEXAGON_SDK=OFF \ -DMLC_LLM_INSTALL_STATIC_LIB=ON \ -DCMAKE_SKIP_INSTALL_ALL_DEPENDENCY=ON \ -DUSE_OPENCL=ON \ -DUSE_OPENCL_ENABLE_HOST_PTR=ON \ -DUSE_CUSTOM_LOGGING=OFF \ ..
make -j 8
. Now you should havelibmlc_llm_module.so
,tvm/libtvm.so
andllm_benchmark
.adb shell
run following commands:4
- means OpenCL (alternatives described in sourcecode); 5th - timeout in seconds of executation; 6th - max tokens; 7th - prompt; 8th - number of executions (in case of 1, it will print generated text).