Simple HTML Dom: How to Remove Elements

Simple HTML Dom: How to remove elements?

There is no dedicated methods for removing elements. You just find all the img elements and then do

$e->outertext = '';

PHP simple html DOM remove all attributes from an html tag

When I use your code and example HTML, it does remove all the attributes from all the <p> tags, even the ones inside <font>, so I'm not sure why yours isn't working.

But it looks like simplehtmldom has methods that specifically deal with attributes so you don't have to use string functions:

$html = file_get_html('page.php');

foreach($html->find('p') as $p) {
foreach ($p->getAllAttributes() as $attr => $val) {
$p->removeAttribute($attr);
}
}
echo $html->innertext;

Hopefully that will be more effective.

Using Simple Html Dom to remove some elements

Here's a solution I found. Although if I can improve the code, it would be appreciated.

<h1>Scraper Noticias</h1>

<?php

include('simple_html_dom.php');

class News {
var $image;
var $fechanoticia;
var $title;
var $description;
var $sourceurl;

function get_image( ) {
return $this->image;
}

function set_image ($new_image) {
$this->image = $new_image;
}

function get_fechanoticia( ) {
return $this->fechanoticia;
}

function set_fechanoticia ($new_fechanoticia) {
$this->fechanoticia = $new_fechanoticia;
}

function get_title( ) {
return $this->title;
}

function set_title ($new_title) {
$this->title = $new_title;
}

function get_description( ) {
return $this->description;
}

function set_description ($new_description) {
$this->description = $new_description;
}

function get_sourceurl( ) {
return $this->sourceurl;
}

function set_sourceurl ($new_sourceurl) {
$this->sourceurl = $new_sourceurl;
}
}

// Create DOM from URL or file
$html = file_get_html('http://www.uvm.cl/noticias_mas.shtml');

$parsedNews = array();

// Find all news items.
foreach($html->find('#cont2 p') as $element) {

$newItem = new News;

// Parse the news item's thumbnail image.
foreach ($element->find('img') as $image) {
$newItem->set_image($image->src);
//echo $newItem->get_image() . "<br />";
}

// Parse the news item's post date.
foreach ($element->find('span.fechanoticia') as $fecha) {
$newItem->set_fechanoticia($fecha->innertext);
//echo $newItem->get_fechanoticia() . "<br />";
}

// Parse the news item's title.
foreach ($element->find('a') as $title) {
$newItem->set_title($title->innertext);
//echo $newItem->get_title() . "<br />";
}

// Parse the news item's source URL link.
foreach ($element->find('a') as $sourceurl) {
$newItem->set_sourceurl("http://www.uvm.cl/" . $sourceurl->href);
}

// Parse the news items' description text.
foreach ($element->find('a') as $link) {
$link->outertext = '';
}

foreach ($element->find('span') as $link) {
$link->outertext = '';
}

foreach ($element->find('img') as $link) {
$link->outertext = '';
}

echo $element->innertext;

}

?>

Remove all href of page by simple html dom

Update: Will it give you the wanted result when you only check for the anchors?

foreach($html->find('a') as $element) {
if (isset($element->href)) {
$element->href = null;
}
}
echo $html;


Related Topics



Leave a reply



Submit