Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: gumbo memory leak on abandoned tags #3036

Merged
merged 2 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions gumbo-parser/src/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ static void abandon_current_tag(GumboParser* parser) {
for (unsigned int i = 0; i < tag_state->_attributes.length; ++i) {
gumbo_destroy_attribute(tag_state->_attributes.data[i]);
}
gumbo_free(tag_state->_name);
gumbo_free(tag_state->_attributes.data);
mark_tag_state_as_empty(tag_state);
gumbo_string_buffer_destroy(&tag_state->_buffer);
Expand Down
17 changes: 16 additions & 1 deletion test/test_memory_leak.rb
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,20 @@ def test_leaking_dtd_nodes_after_internal_subset_removal
puts
end
end

describe "libgumbo abandoned tag" do
it "should not leak the tag name" do
html = <<~HTML
<asdfasdfasdfasdfasdfasdfasdfasdfasdfasdf foo="bar
HTML
# should increase over the first 200_000 iterations (general parsing overhead),
# but then flatten out. on my machine at about 169k
1_000_000.times do |j|
Nokogiri::HTML5::Document.parse(html)
printf "%s::%s: (iter %d) %d\n", self.class, __method__, j, MemInfo.rss if j % 20_000 == 0
end
end
end
end # if NOKOGIRI_GC

def test_object_space_memsize_of
Expand Down Expand Up @@ -336,7 +350,8 @@ module MemInfo
rescue
4096
end
STATM_PATH = "/proc/#{Process.pid}/statm"

STATM_PATH = "/proc/self/statm"
STATM_FOUND = File.exist?(STATM_PATH)

def self.rss
Expand Down