PHP DOMDocument 操作dom文档的知识整理

2021-12-01
0条评论 176 次浏览
2021-12-010条评论 176 次浏览

DOMDocument加载文档的几种方法的区别

public load(string $filename, int $options = 0): DOMDocument|bool // 从文件中加载xml

public loadXML(string $source, int $options = 0): DOMDocument|bool // 从字符串中加载html

public loadHTML(string $source, int $options = 0): DOMDocument|bool // 从字符串中加载html

public loadHTMLFile(string $filename, int $options = 0): DOMDocument|bool // 从文件中加载html

乱码问题

DOMDocument::loadHTML默认编码是ISO-8859-1,所以我们需要使用 mb_convert_encoding 编码转换:

  $html_doc->loadHTML(mb_convert_encoding($htmlStr, 'HTML-ENTITIES', 'UTF-8'));

使用DOMDocument修改dom的案例

获取dom节点的值(遍历文档中的pre元素,根据data-language查找 xml 并提取xml文本)
$html_doc = new DOMDocument();
$html_doc->loadHTML(mb_convert_encoding($htmlStr, 'HTML-ENTITIES', 'UTF-8'));
$html_doc->normalizeDocument();
$xml_array = array();
$pres = $html_doc->getElementsByTagName('pre');
foreach ($pres as $pre){
                if ($pre->hasAttributes()){
                    $is_xml = $pre->getAttribute('data-language') === 'xml';
                    if ($is_xml){
            $xml_text= $pre->nodeValue;//nodeValue 获取 dom中文本
            array_push($xml_array,$xml_text);
                    }
                }
            }
$new_html =  $html_doc->saveHTML();

创建一个新元素节点,并添加属性attribute
$html_doc = new DOMDocument();
$html_doc->loadHTML(mb_convert_encoding($htmlStr, 'HTML-ENTITIES', 'UTF-8'));
$html_doc->normalizeDocument();
$node = $html_doc->createElement("div");
$new_node = $html_doc->appendChild($node);
$new_node->setAttribute("data-test", 'hello_world');
$new_html =  $html_doc->saveHTML();
删除dom元素(遍历文档中的pre元素,根据data-language查找 xml ,若是xml 则删除 掉pre元素节点)
$html_doc = new DOMDocument();
$html_doc->loadHTML(mb_convert_encoding($htmlStr, 'HTML-ENTITIES', 'UTF-8'));
$html_doc->normalizeDocument();
$pres = $html_doc->getElementsByTagName('pre');
foreach ($pres as $pre){
                if ($pre->hasAttributes()){
                    $is_xml = $pre->getAttribute('data-language') === 'xml';
                    if ($is_xml){
            $pre->parentNode->removeChild($pre);
                    }
                }
            }
$new_html =  $html_doc->saveHTML();

相关链接

https://www.php.net/manual/en/class.domdocument.php

https://www.php.net/manual/en/class.domnodelist.php

本文通过 YUQUE WORDPRESS 同步自语雀云端知识库

发表评论

您的电子邮箱地址不会被公开。