Delphi HTML Parser
April 28, 2022 Forked from https://github.com/edwinyzh/htmlparser Translated Chinese strings/comments to English via Google Translate
The code is modified from HtmlParser of the original wr960204. Because of my own needs, I need to modify the html, but only support the read operation, so on this basis Modified and named HtmlParserEx.pas to distinguish it.
// Load example from file
procedure Test;
var
LHtml: IHtmlElement;
LList: IHtmlElementList;
LStrStream: TStringStream;
begin
LStrStream := TStringStream.Create('', TEncoding.UTF8);
try
LStrStream.LoadFromFile('view-source_https___github.com_ying32_htmlparser.html');
LHtml := ParserHTML(LStrStream.DataString);
if LHtml <> nil then
begin
LList := LHtml.SimpleCSSSelector('a');
for LHtml in LList do
Writeln('url:', lhtml.Attributes['href']);
end;
finally
LStrStream.Free;
end;
end;
ying32 modification record: Email:1444386932@qq.com
May 4, 2017
- Remove the reference to the RegularExpressions unit, no longer use TRegEx and use TPerlRegEx in the RegularExpressionsCore unit
April 19, 2017
- Add the compilation instruction "UseXPath" that uses the XPath function. By default, XPath is not used. Personally, it is useless.
November 23, 2016
- Simple support for XPath, simple, use xpath to css selector, hey The code of xpath conversion is changed from python version
IHtmlElement
LHtml.FindX('/html/head/title').Each(
procedure(AIndex: Integer; AEl: IHtmlElement)
begin
Writeln('xpath index=', AIndex, ', a=', AEl.Text);
end
);
November 15, 2016
Changes to IHtmlElement and THtmlElement:
- Add the Set method to the Attributes attribute
- Add Set method to TagName property
- Add Parent property
- Add RemoveAttr method
- Add Remove method
- Add RemoveChild method
- Add the Find method, which is another name for SimpleCSSSelector
- _GetHtml no longer directly attaches the value of the FORignal attribute, but uses GetSelfHtml to re-assign the modified element and update the value of FORignal
- Add the Text property
- Modify InnerText and Text properties to add write function
- Add the AppliedChild method
Changes to IHtmlElementList and THtmlElementList:
- Add RemoveAll method
- Add the Remove method
- Add the Each method
- Add the Text property
IHtmlElement
// modify properties
EL.Attributes['class'] := 'xxxx';
// modify the tag
EL.TagName = 'a';
// remove myself
EL.Remove;
// remove child nodes
EL.RemoveChild(El2);
// css selector search, simplified use
El.Find('a');
// append a new element
el2 := El.AppendChild('a');
IHtmlElementList
// remove the selected element
LHtml.Find('a').RemoveAll;
// find and iterate
LHtml.Find('a').Each(
procedure(AIndex: Integer; AEl: IHtmlElement)
begin
Writeln('Index=', AIndex, ', href=', AEl.Attributes['href']);
end);
// Direct output, only the first element selected
Writeln(LHtml.Find('title').Text);