-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added fuzzing support #3007
added fuzzing support #3007
Conversation
thoughts @flavorjones ? |
@fuzzy-boiii23a Thank you for opening this! I'm traveling this week but will make time to take a look. |
SanityCheckPointers (and as an aside, I've been trying to move away from such terminology) doesn't have any side effects and it doesn't return any values. 1. Have you checked that it doesn't get dead stripped? 2. How will you ensure that it doesn't get dead stripped in the future? The else will definitely get pruned so if you're hoping that will check if the text pointer is accessible, I don't think that will do it. There are also pointers for attributes that probably need to be checked as well. |
hi @stevecheckoway , thanks for the heads up! i've made some changes and tested them using -03 to ensure i make it into each if else statement and i do and i also added checking attributes |
Hey @flavorjones hope you're going well and had a great trip, can you think of anything else that I could improve to make it easier to use? |
@fuzzy-boiii23a I've not been able to spend much time on OSS over the past few weeks, I appreciate your patience. |
Absolutely no stress at all :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for doing this work!! I really appreciate it, and I'm sorry it's taken me so long to review (I've been traveling a lot).
Can you also add a (very brief) summary of how to run the fuzzer to CONTRIBUTING.md
? I've only ever used a fuzzer in one other project, and the configuration was much different from this.
- What commands should I run?
- What does output look like if no issues are found?
- What does output look like if an issue is found?
Added those changes https://github.com/fuzzy-boiii23a/nokogiri/blob/fuzzing_support/CONTRIBUTING.md#fuzzing-your-gumbo-parser-changes hope you like them and no need to be sorry thanks for checking out the changes! hope your travel went well :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick response. I made a few small requests.
Also, I noticed that the user account associated with the last few commits was different, wanted to ask if that was intentional?
Addressed those changes and the user account was unintentional that's a personal account I use for non fuzzing related projects and forgot to update config :) |
and - set some bash safety flags - make it runnable from any directory - simplify the makefile execution
use separate compilation directories to avoid the jobs clobbering each other if users run make with a `-j` flag
and rename the top-level target from `fuzzing` to `fuzzers`
999176c
to
ea7154c
Compare
@fuzzy-boiii23a I added a few commits primarily to clean up the build.sh script and allow building the fuzzers in parallel. At this point, I'm able to run the fuzzers, and I've even found a couple of (very small) memory leaks in gumbo! (See #3036) I will want to do some follow-up work to assemble a corpus that's useful and make that scriptable. What else do you think needs to be done? Anything? Are you OK if I squash-merge this? |
Thanks for that and that's great to hear!, I'm happy to assemble a corpus and a dictionary but for now I think everything is fine to be merged. I will get around to doing that next week. |
Alright so i added a dictionary and corpus and added how to use it in CONTRIBUTING.md which greatly improves coverage so i cannot think of any further improvements to make, looks to to merge to me thanks for the support @flavorjones |
**What problem is this PR intended to solve?** The fuzzer in #3007 found a memory leak when incomplete tags are encountered. The repro added to the memory leak test suite is: ```html <asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf foo="bar ``` which will leak the tag name without this fix. cc @stevecheckoway **Have you included adequate test coverage?** Yes, but added it to the memory leak suite which can only be run manually today (see #1603 for context on plans to improve it). **Does this change affect the behavior of either the C or the Java implementations?** No functional change.
@fuzzy-boiii23a Thank you! Can you help me understand what's in the |
No problem, I generated that corpus by running the fuzzer overnight with a html corpus I grabbed from https://github.com/dvyukov/go-fuzz-corpus along with some scraping of html via the web and after running for 24 hours I performed corpus minimization (https://llvm.org/docs/LibFuzzer.html#id25) which will run each file through the fuzzer and if for example 10 files execute the same amount of code in the target, libfuzzer will discard 9 files and keep 1 that exercises the same functionality. You could improve the corpus by collecting heaps of html files and then performing minimization of these inputs using https://llvm.org/docs/LibFuzzer.html#id25 |
@flavorjones I had a look for the public corpus that google currently uses for fuzzing nokogiri's implementation of gumbo and it can be downloaded from https://storage.googleapis.com/nokogiri-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/nokogiri_parse_fuzzer/public.zip I'd say that this is the best one to use as the corpus is updated and maintained across their daily run against gumbo which is about a billion executions on average |
Hey @flavorjones hope you're doing well, can we please finalise and merge this one to close it off? |
@fuzzy-boiii23a Thanks for the nudge. The only other thing I need you to educate me on is how to manage the oss-fuzz emails I've been receiving to my personal email. In #2992 we talked about using the google group I created, https://groups.google.com/g/nokogiri-oss-fuzz. How can I configure the oss-fuzz settings? |
Hi @flavorjones , you can manage oss fuzz by going to https://oss-fuzz.com/ and logging in using the google email you're receiving alerts on however you can have a look at crashes and how to reproduce them by checking out https://oss-fuzz.com/testcases?project=nokogiri&open=yes whilst logged in, unfortunately google group cannot be used as you won't be able to access the oss fuzz dashboard as mentioned in google/oss-fuzz#11004 (comment) |
@fuzzy-boiii23a Ah, thank you! The pointer to google/oss-fuzz#11004 is what I was missing (I had no idea how it was configured). Really very appreciative of your work here!!! ❤️ |
This just adds fuzzing support such that if any changes are made to libgumbo it's easy to fuzz these changes before merging something into the main branch.