Code: Select all
Pros and Cons of a Website
(this is a sample of what it uses as line breaks. Take note of the tag).
A SAMPLE TEXT
...same pattern in div 1
...same...
Code: Select all
...A SAMPLE TEXT
...same pattern in div 1
...same...
Code: Select all
$dom = new DOMDocument;
$dom->loadHTML($filecontent, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$body = $xpath->query('//html/body');
$nodes = $body->item(0)->getElementsByTagName('*');
foreach ($nodes as $node) {
if($node->tagName=='script') $node->parentNode->removeChild($node);
if($node->tagName=='a') continue;
$attrs = $xpath->query('@*', $node);
foreach($attrs as $attr) {
$attr->parentNode->removeAttribute($attr->nodeName);
}
}
echo str_ireplace(['', ''], '', $dom->saveHTML($body->item(0)));
Code: Select all
Pros and Cons of a Website
A SAMPLE TEXT
...same pattern in div 1
...same...
Code: Select all
if($node->tagName=='script' || $node->tagName=='h1') $node->parentNode->removeChild($node);
Code: Select all
becomes
- Wie bekomme ich innerHTML von DOMNode? (Haim Evgis Antwort, ich weiß nicht, wie ich sie richtig umsetzen soll, Keyacoms Antwort auch), Marco Marsalas Antwort kommt mir am nächsten, aber die Divs haben alle ihre Klassen behalten.