Skip to content

Commit

Permalink
doc: tidy up HTML5::Document docstrings
Browse files Browse the repository at this point in the history
  • Loading branch information
flavorjones committed Dec 6, 2024
1 parent df70011 commit 2cf4f4a
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 32 deletions.
28 changes: 17 additions & 11 deletions lib/nokogiri/html5.rb
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,22 @@ def self.HTML5(...)
# The document and fragment parsing methods support options that are different from
# Nokogiri::HTML4::Document or Nokogiri::XML::Document.
#
# - <tt>Nokogiri.HTML5(html, url = nil, encoding = nil, **options)</tt>
# - <tt>Nokogiri::HTML5.parse(html, url = nil, encoding = nil, **options)</tt>
# - <tt>Nokogiri::HTML5::Document.parse(html, url = nil, encoding = nil, **options)</tt>
# - <tt>Nokogiri::HTML5.fragment(html, encoding = nil, **options)</tt>
# - <tt>Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **options)</tt>
# - <tt>Nokogiri.HTML5(html, url:, encoding:, **parse_options)</tt>
# - <tt>Nokogiri::HTML5.parse(html, url:, encoding:, **parse_options)</tt>
# - <tt>Nokogiri::HTML5::Document.parse(html, url:, encoding:, **parse_options)</tt>
# - <tt>Nokogiri::HTML5.fragment(html, encoding = nil, **parse_options)</tt>
# - <tt>Nokogiri::HTML5::DocumentFragment.parse(html, encoding = nil, **parse_options)</tt>
#
# The four currently supported options are +:max_errors+, +:max_tree_depth+, +:max_attributes+,
# and +:parse_noscript_content_as_text+ described below.
# The four currently supported parse options are
#
# === Error reporting
# - +max_errors:+ (Integer, default 0) Maximum number of parse errors to report in HTML5::Document#errors.
# - +max_tree_depth:+ (Integer, default +Nokogiri::Gumbo::DEFAULT_MAX_TREE_DEPTH+) Maximum tree depth to parse.
# - +max_attributes:+ (Integer, default +Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES+) Maximum number of attributes to parse per element.
# - +parse_noscript_content_as_text:+ (Boolean, default false) When enabled, parse +noscript+ tag content as text, mimicking the behavior of web browsers.
#
# These options are explained in the following sections.
#
# === Error reporting: +max_errors:+
#
# Nokogiri contains an experimental HTML5 parse error reporting facility. By default, no parse
# errors are reported but this can be configured by passing the +:max_errors+ option to
Expand Down Expand Up @@ -112,7 +118,7 @@ def self.HTML5(...)
# are not part of Nokogiri's public API. That is, these are subject to change without Nokogiri's
# major version number changing. These may be stabilized in the future.
#
# === Maximum tree depth
# === Maximum tree depth: +max_tree_depth:+
#
# The maximum depth of the DOM tree parsed by the various parsing methods is configurable by the
# +:max_tree_depth+ option. If the depth of the tree would exceed this limit, then an
Expand All @@ -126,7 +132,7 @@ def self.HTML5(...)
# # raises ArgumentError: Document tree depth limit exceeded
# doc = Nokogiri.HTML5(html, max_tree_depth: -1)
#
# === Attribute limit per element
# === Attribute limit per element: +max_attributes:+
#
# The maximum number of attributes per DOM element is configurable by the +:max_attributes+
# option. If a given element would exceed this limit, then an +ArgumentError+ is thrown.
Expand All @@ -142,7 +148,7 @@ def self.HTML5(...)
# doc = Nokogiri.HTML5(html, max_attributes: -1)
# # parses successfully
#
# === Parse +noscript+ elements' content as text
# === Parse +noscript+ elements' content as text: +parse_noscript_content_as_text:+
#
# By default, the content of +noscript+ elements is parsed as HTML elements. Browsers that
# support scripting parse the content of +noscript+ elements as raw text.
Expand Down
56 changes: 35 additions & 21 deletions lib/nokogiri/html5/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -43,41 +43,54 @@ class Document < Nokogiri::HTML4::Document

# Get the parser's quirks mode value. See HTML5::QuirksMode.
#
# This method returns `nil` if the parser was not invoked (e.g., `Nokogiri::HTML5::Document.new`).
# This method returns +nil+ if the parser was not invoked (e.g., Nokogiri::HTML5::Document.new).
#
# Since v1.14.0
attr_reader :quirks_mode

class << self
# :call-seq:
# parse(input)
# parse(input, url=nil, encoding=nil, **options)
# parse(input, url=nil, encoding=nil) { |options| ... }
# parse(input) { |parse_options| ... }
# parse(input, url:, encoding:, **parse_options)
#
# Parse HTML5 input.
# Parse \HTML input with a parser compliant with the HTML5 spec. This method uses the
# encoding of +input+ if it can be determined, or else falls back to the +encoding:+
# parameter.
#
# [Parameters]
# - +input+ may be a String, or any object that responds to _read_ and _close_ such as an
# IO, or StringIO.
# [Required Parameters]
# - +input+ (String | IO) the \HTML content to be parsed.
#
# - +url+ (optional) is a String indicating the canonical URI where this document is located.
# [Optional Parameters]
# - +url:+ (String) the base URI of the document.
# - +encoding+ (Encoding) The encoding that should be used when processing the
# document. This option is only used as a fallback when the encoding of +input+ cannot be
# determined.
# - +parse_options+ (Hash) represents keywords arguments that control the behavior of the
# parser. See rdoc-ref:HTML5@Parsing+options for a list of available options.
#
# - +encoding+ (optional) is the encoding that should be used when processing
# the document.
# [Yields]
# If present, the block will be passed a Hash object to modify with parse options before the
# input is parsed. See rdoc-ref:HTML5@Parsing+options for a list of available options.
#
# - +options+ (optional) is a configuration Hash (or keyword arguments) to set options
# during parsing. The three currently supported options are +:max_errors+,
# +:max_tree_depth+ and +:max_attributes+, described at Nokogiri::HTML5.
# ⚠ Note that +url:+ and +encoding:+ cannot be set by the configuration block.
#
# ⚠ Note that these options are different than those made available by
# Nokogiri::XML::Document and Nokogiri::HTML4::Document.
# [Returns] Nokogiri::HTML5::Document
#
# - +block+ (optional) is passed a configuration Hash on which parse options may be set. See
# Nokogiri::HTML5 for more information and usage.
# *Example:* Parse a string with a specific encoding and custom max errors limit.
#
# [Returns] Nokogiri::HTML5::Document
# Nokogiri::HTML5::Document.parse(socket, encoding: "ISO-8859-1", max_errors: 10)
#
# *Example:* Parse a string setting the +:parse_noscript_content_as_text+ option using the
# configuration block parameter.
#
# Nokogiri::HTML5::Document.parse(input) { |c| c[:parse_noscript_content_as_text] = true }
#
def parse(string_or_io, url_ = nil, encoding_ = nil, url: url_, encoding: encoding_, **options, &block)
def parse(
string_or_io,
url_ = nil, encoding_ = nil,
url: url_, encoding: encoding_,
**options, &block
)
yield options if block
string_or_io = "" unless string_or_io

Expand Down Expand Up @@ -144,7 +157,8 @@ def initialize(*args) # :nodoc:
# - +markup+ (String) The HTML5 markup fragment to be parsed
#
# [Returns]
# Nokogiri::HTML5::DocumentFragment. This object's children will be empty if `markup` is not passed, is empty, or is `nil`.
# Nokogiri::HTML5::DocumentFragment. This object's children will be empty if +markup+ is not
# passed, is empty, or is +nil+.
#
def fragment(markup = nil)
DocumentFragment.new(self, markup)
Expand Down

0 comments on commit 2cf4f4a

Please sign in to comment.