Stars: 231
Forks: 94
Pull Requests: 11
Issues: 22
Watchers: 26
Last Updated: 2020-09-15 04:14:54
HTML parser for PHP - Парсер HTML
License: MIT License
Languages: PHP, HTML
Attention: New version can break compatibility, in that case use previous version under the v1.0 branch or tag which supports even php 5.4+
\nokogiri class is left for compatibility
This library is a fast HTML parser, which can work with invalid code (errors are ignored).
Under the hood is used LibXML.
As the input you can use HTML string in UTF-8 encoding or DOMDocument.
For the querying elements CSS selectors are used, which are transformed to XPath expressions internally.
HTML errors are ignored
$saw = new \nokogiri($html);
$saw = \nokogiri::fromHtml($html);
$saw = new \nokogiri($dom);
$saw = \nokogiri::fromDom($dom);
$cssSelector elements have the following format:
tagName[attribute=value]#elementId.className:pseudoSelector(expression)
$saw->get('div > a[rel=bookmark]')->toArray();
Returns underlying DOM structure as an array.
Values are attributes, text content under #text
key and child elements under numeric keys
Returns HTML string
Returns DOMDocument. Given true as the first argument - can also return DOMNodeList or DOMElement
foreach ($saw->get('#sidebar a.topic') as $link){
var_dump($link['#text']);
}
MIT