-
-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accessing DocumentNode.OuterHtml Causes Stack Overflow Exception On Demand #103
Comments
Hello @blankers , Thank you for reporting, We will look at this issue soon. Best Regards, Jonathan |
Hello @blankers , Just to let you know we took some time recently to investigate it but unfortunately, we have not been able to find out the cause. We will try to investigate it again when my new developer will be more comfortable with this library. Best Regards, Jonathan |
Please find attached a project that also exhibits a stack overflow when run, on getting the OuterHtml of a node. The code in the project reads some HTML, modifies it a bit, then tries to access the OuterHtml of the document node. I have not taken the time to investigate whether the modifications are a necessary part of reproducing the problem. When the relevant code is run in the context of an ASP.NET Core web site, different behaviour is observed. If the code is running under the debugger, the debugger closes with no user interaction. Setting a breakpoint at the line that accesses the OuterHtml getter and mousing over it causes a popup to appear as seen in the screengrab. Googling the error code 0xc0000005, it appears to mean that an access violation occurred. |
Further to the above - the failure is not seen (in either of its forms) if the line node.Attributes.RemoveAll(); is commented out. |
Workaround
|
It's an old issue, but I also hit it. Debugging the code now, reproducing the Stack Overflow (call stack from bottom to top):
|
Further investigation (I don't quite understand the code yet, but..)
|
More info. The problem happens if I
|
Hello @alexbk66 , Do you think you could reproduce the issue in a Fiddle? Not sure if it will get fixer but surely we can look at it. Here is a working fiddle with your example: https://dotnetfiddle.net/ImPNc1 Best Regards, Jon |
Hi Jon, I copied my HTML https://dotnetfiddle.net/LQ5nAB It doesn't 'stack overflow', but sill fails because of the <style> tag:
In VisualStudio 'stack overflow' also happens at this tag. But if i add spaces around the tag - then it works.
|
Hello @alexbk66 , It currently fail because the end tag is badly formatted Therefore the following line I will wait for your investigation to reproduce the stack overflow issue. |
The spaces were inserted by .NET Fiddle for some reason when I copied the code. You are right though, if I remove the spaces - it works. |
I've encountered a strange situation with HTML source from http://portalamis.org.br/?secao=noticias
See the raw html in the attached file:
http-portalamis.org.br-secao-noticias.html.txt
Here's my code:
public HtmlAgilityPack.HtmlDocument document { get; private set; }
....
....
encoding = Encoding.UTF8;
this.document = new HtmlAgilityPack.HtmlDocument();
this.document.OptionFixNestedTags = true;
this.document.OptionAutoCloseOnEnd = true;
this.document.OptionDefaultStreamEncoding = encoding;
this.document.LoadHtml(htmlContent);
Then simply accessing
this.document.DocumentNode.OuterHtml
causes a stack overflow on demand.
The text was updated successfully, but these errors were encountered: