-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retsnoop improvements for LBR, stitched stacks, and interim stacks #72
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Handle cases when ksym can't be found better (print address instead of nothing at all). And also make sure that stack trace symbolization only considers function symbols, not data symbols. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
If some deep function errored out, but then that error was ignored, entire stack isn't really an error stack, so mark it as such. One simple way to produce such situation is by tracing with retsnoop: sudo retsnoop -e bpf_prog_load -a pop_stack -T -E -n simfail And triggering successful BPF program load with simfail: sudo simfail bpf-simple-obj Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Retsnoop uses a bunch of special unicode characters, extract all of them into named #defines. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Logging from BPF side shouldn't use explicit newlines in string literals, as they are appended automatically. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
When we stich together call stack due to intervening non-failing traced functions, we should preserve the deepest nested stack trace, as that gives us the most information. The difference can be easily seen with the following trigger (using veristat on one of BPF selftest failing programs): sudo veristat tools/testing/selftests/bpf/test_global_func10.bpf.o -f global_func10 Now, tracing the following with retsnoop (no intervening functions, no stitching required): sudo retsnoop -T -e 'bpf_prog_load' -n veristat -E \ -a do_check_common -a do_check -a check_func_call \ -a btf_check_subprog_call -a check_stack_access_within_bounds \ -a check_stack_range_initialized -a check_helper_mem_access ... gives the following useful output: 20:36:15.718917 -> 20:36:15.718991 TID/PID 3782767/3782767 (veristat/veristat): FUNCTION CALLS RESULT DURATION ------------------------------------------------------------------ --------- -------- → bpf_prog_load → bpf_check → do_check_common → do_check ↔ check_stack_access_within_bounds [0] 2.701us → check_func_call → btf_check_subprog_call → check_helper_mem_access → check_stack_range_initialized ↔ check_stack_access_within_bounds [-EINVAL] 2.035us ← check_stack_range_initialized [-EINVAL] 13.428us ← check_helper_mem_access [-EINVAL] 18.536us ← btf_check_subprog_call [-EINVAL] 21.193us ← check_func_call [-EINVAL] 23.204us ← do_check [-EINVAL] 32.541us ← do_check_common [-EINVAL] 37.686us ← bpf_check [-EINVAL] 59.636us ← bpf_prog_load [-EINVAL] 71.932us entry_SYSCALL_64_after_hwframe+0x6c (entry_SYSCALL_64 @ arch/x86/entry/entry_64.S:130) do_syscall_64+0x58 (arch/x86/entry/common.c:83) . do_syscall_x64 (arch/x86/entry/common.c:52) __x64_sys_bpf+0x18 (kernel/bpf/syscall.c:5765) . __se_sys_bpf (kernel/bpf/syscall.c:5765) . __do_sys_bpf (kernel/bpf/syscall.c:5767) __sys_bpf+0x33d (kernel/bpf/syscall.c:0) 71us [-EINVAL] bpf_prog_load+0x59a (kernel/bpf/syscall.c:2895) 59us [-EINVAL] bpf_check+0x894 (kernel/bpf/verifier.c:21307) . do_check_main (kernel/bpf/verifier.c:20637) 37us [-EINVAL] do_check_common+0x38a (kernel/bpf/verifier.c:20546) 32us [-EINVAL] do_check+0x1b03 (kernel/bpf/verifier.c:0) 23us [-EINVAL] check_func_call+0x96 (kernel/bpf/verifier.c:0) 21us [-EINVAL] btf_check_subprog_call+0x191 (kernel/bpf/verifier.c:9459) . btf_check_func_arg_match (kernel/bpf/verifier.c:0) check_mem_reg+0x127 (kernel/bpf/verifier.c:7421) 18us [-EINVAL] check_helper_mem_access+0x142 (kernel/bpf/verifier.c:0) 13us [-EINVAL] check_stack_range_initialized+0xaf (kernel/bpf/verifier.c:7122) ! 2us [-EINVAL] check_stack_access_within_bounds Now, if we add `-a verbose`, verbose screws up actual stack trace (note no offsets, no file:line info, etc): sudo retsnoop -T -e 'bpf_prog_load' -n veristat -E \ -a do_check_common -a do_check -a check_func_call \ -a btf_check_subprog_call -a check_stack_access_within_bounds \ -a check_stack_range_initialized -a check_helper_mem_access \ -a verbose 20:37:21.703351 -> 20:37:21.703440 TID/PID 3785005/3785005 (veristat/veristat): FUNCTION CALLS RESULT DURATION ------------------------------------------------------------------ --------- -------- → bpf_prog_load → bpf_check → do_check_common → do_check ↔ check_stack_access_within_bounds [0] 2.617us → check_func_call → btf_check_subprog_call → check_helper_mem_access → check_stack_range_initialized → check_stack_access_within_bounds ↔ verbose [void] 1.985us ← check_stack_access_within_bounds [-EINVAL] 4.568us ← check_stack_range_initialized [-EINVAL] 13.933us ← check_helper_mem_access [-EINVAL] 19.404us ← btf_check_subprog_call [-EINVAL] 21.383us ↔ verbose [void] 1.276us ← check_func_call [-EINVAL] 24.489us ← do_check [-EINVAL] 33.878us ← do_check_common [-EINVAL] 39.250us ↔ verbose [void] 1.196us ↔ verbose [void] 0.903us ↔ verbose [void] 0.872us ↔ verbose [void] 0.875us ↔ verbose [void] 0.735us ↔ verbose [void] 0.794us ↔ verbose [void] 2.842us ← bpf_check [-EINVAL] 72.025us ← bpf_prog_load [-EINVAL] 85.944us entry_SYSCALL_64_after_hwframe+0x6c (entry_SYSCALL_64 @ arch/x86/entry/entry_64.S:130) do_syscall_64+0x58 (arch/x86/entry/common.c:83) . do_syscall_x64 (arch/x86/entry/common.c:52) __x64_sys_bpf+0x18 (kernel/bpf/syscall.c:5765) . __se_sys_bpf (kernel/bpf/syscall.c:5765) . __do_sys_bpf (kernel/bpf/syscall.c:5767) __sys_bpf+0x33d (kernel/bpf/syscall.c:0) 85us [-EINVAL] bpf_prog_load+0x59a (kernel/bpf/syscall.c:2895) ! 72us [-EINVAL] bpf_check !⌇ 39us [-EINVAL] do_check_common !⌇ 33us [-EINVAL] do_check !⌇ 24us [-EINVAL] check_func_call !⌇ 21us [-EINVAL] btf_check_subprog_call !⌇ 19us [-EINVAL] check_helper_mem_access !⌇ 13us [-EINVAL] check_stack_range_initialized !⌇ 4us [-EINVAL] check_stack_access_within_bounds With the fixes in this patch, we get much more complete and useful output: 20:39:59.010233 -> 20:39:59.010330 TID/PID 3796075/3796075 (veristat/veristat): FUNCTION CALLS RESULT DURATION ------------------------------------------------------------------ --------- -------- → bpf_prog_load → bpf_check → do_check_common → do_check ↔ check_stack_access_within_bounds [0] 2.279us → check_func_call → btf_check_subprog_call → check_helper_mem_access → check_stack_range_initialized → check_stack_access_within_bounds ↔ verbose [void] 2.304us ← check_stack_access_within_bounds [-EINVAL] 5.028us ← check_stack_range_initialized [-EINVAL] 15.827us ← check_helper_mem_access [-EINVAL] 20.187us ← btf_check_subprog_call [-EINVAL] 22.995us ↔ verbose [void] 1.437us ← check_func_call [-EINVAL] 26.565us ← do_check [-EINVAL] 37.165us ← do_check_common [-EINVAL] 43.172us ↔ verbose [void] 1.237us ↔ verbose [void] 0.973us ↔ verbose [void] 0.959us ↔ verbose [void] 0.914us ↔ verbose [void] 0.784us ↔ verbose [void] 1.858us ↔ verbose [void] 2.434us ← bpf_check [-EINVAL] 77.849us ← bpf_prog_load [-EINVAL] 93.830us entry_SYSCALL_64_after_hwframe+0x6c (entry_SYSCALL_64 @ arch/x86/entry/entry_64.S:130) do_syscall_64+0x58 (arch/x86/entry/common.c:83) . do_syscall_x64 (arch/x86/entry/common.c:52) __x64_sys_bpf+0x18 (kernel/bpf/syscall.c:5765) . __se_sys_bpf (kernel/bpf/syscall.c:5765) . __do_sys_bpf (kernel/bpf/syscall.c:5767) __sys_bpf+0x33d (kernel/bpf/syscall.c:0) 93us [-EINVAL] bpf_prog_load+0x59a (kernel/bpf/syscall.c:2895) 77us [-EINVAL] bpf_check+0x894 (kernel/bpf/verifier.c:21307) . do_check_main (kernel/bpf/verifier.c:20637) ⌇ 43us [-EINVAL] do_check_common+0x38a (kernel/bpf/verifier.c:20546) ⌇ 37us [-EINVAL] do_check+0x1b03 (kernel/bpf/verifier.c:0) ⌇ 26us [-EINVAL] check_func_call+0x96 (kernel/bpf/verifier.c:0) ⌇ 22us [-EINVAL] btf_check_subprog_call+0x191 (kernel/bpf/verifier.c:9459) ⌇ . btf_check_func_arg_match (kernel/bpf/verifier.c:0) check_mem_reg+0x127 (kernel/bpf/verifier.c:7421) ⌇ 20us [-EINVAL] check_helper_mem_access+0x142 (kernel/bpf/verifier.c:0) ⌇ 15us [-EINVAL] check_stack_range_initialized+0xaf (kernel/bpf/verifier.c:7122) !⌇ 5us [-EINVAL] check_stack_access_within_bounds Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Similarly to how we preserve deepest kernel stack trace for stitched stacks, do the same with LBRs, which should give user more useful information about deepest nested failing function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
This also enables sending less data over BPF ringbuf. BPF logic will now concatenate stitched parts to form one logical stitched stack, user-space part now only needs to know at which point the stitching happened (for visualization), which is provided through stitched_pos field. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
retsnoop used to support --intermediate-stacks (-A) before function args capture was added. It's still useful feature for some deeper and more advanced debugging scenarios. So add it back, taking into account all the session concepts added. Now it's "interim stacks" controlled by --interim-stacks (-I) flag. Retsnoop will attempt to minimize the amount of interim stacks reported, so whenever stack stitching preserves all the captured information for later, we'll postpone submitting interim stack. Only when data loss is occurring (i.e., we are overwriting some of the captured function call information), will retsnoop emit interim (incomplete, in progress) trace and call stack information (together with its LBR data, of course). Note, previously we wouldn't emit function trace data for interim stacks. We change that now and emit relevant portions of it, as it significantly helps in debugging by augment call stack with extra information. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
This is a nice way to get LBR and stack traces deep inside kernel functions, if necessary. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.