HTML Purifier: Removing an Element Conditionally Based on Its Attributes

HTML Purifier: Removing an element conditionally based on its attributes

Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:

// a bit of context
$htmlDef = $this->configuration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');

// HTMLPurifier_AttrTransform_RemoveLoneHttp strips 'href="http:/"' from
// all anchor tags (see first post for class detail)
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();

// this is the magic! We're making 'href' a required attribute (note the
// asterisk) - now HTML Purifier removes <a></a>, as well as
// <a href="http:/"></a> after HTMLPurifier_AttrTransform_RemoveLoneHttp
// is through with it!
$htmlDef->addAttribute('a', 'href*', new HTMLPurifier_AttrDef_URI());

It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *

HTMLPurifier removing a name=#someanchorname/a - how to stop this from happening?

The Attr.EnableID rule removes html id attributes by default. (And it looks like name attributes as well.)
http://htmlpurifier.org/live/configdoc/plain.html#HTML.EnableAttrID

Why it happens is explained here, http://htmlpurifier.org/docs/enduser-id.html.

Modify all links in HTML Purifier

Actually I found partial solution on one of the links on the forum.

This is what I need to do:

$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);

So the full thing looks like this:

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$purifier = new HTMLPurifier($config);

Whitelist element with class of, using htmlpurifier

Ok so based on Ambush-comander's suggestion I was able to remove all spans that did not have a specific class the idea is that if the class it required then it the element doesn't have that class the element will be removed.

I did some research and found htmlpurifier customize page which explains how to add an attribute following their instructions i only need an additonal four lines of code so here is what how I did it

 // more configuration stuff up here
$config->set('HTML.DefinitionID', 'enduser-customize.html editor');
$config->set('HTML.DefinitionRev', 1);
$def = $config->getHTMLDefinition(true);
$def->addAttribute('span', 'class*', new HTMLPurifier_AttrDef_Enum(
array('allowed')
));
// purify down here

the * in class makes the class requried and becuse we only allow the "allowed" class everything else gets striped.
now, there is one caveats to doing it this way. if someone put that class in there span then it would be allowed in my case I'm not really using "allowed" I'm using something else that will be replaced by html purifier

hth someone else

and thanks to ambush and pinkgothic for all their help!

htmlpurifier allow scheme for specific tags

Here's my solution;

Filters;

class ImgSrcTransform extends HTMLPurifier_AttrTransform
{
protected $parse;

public function __construct(){
$this->parser = new HTMLPurifier_URIParser();
}
public function transform($attr, $config, $context)
{
if(!isset($attr['src'])){
return $attr;
}

$url = $this->parser->parse($attr['src']);
if($url->scheme == 'http' || $url->scheme == 'https'){
unset($attr['src']);
}

return $attr;
}
}

class LinkHrefTransform extends HTMLPurifier_AttrTransform
{
protected $parse;

public function __construct(){
$this->parser = new HTMLPurifier_URIParser();
}
public function transform($attr, $config, $context)
{
if(!isset($attr['href'])){
return $attr;
}

$url = $this->parser->parse($attr['href']);

if($url->scheme == 'data'){
unset($attr['href']);
}

return $attr;
}
}

Using the filters;

$config = HTMLPurifier_Config::createDefault();
$config->set('URI.AllowedSchemes', array('data' => true, 'http' => true, 'https' => true));
$config->set('HTML.AllowedElements', $elements);
$config->set('HTML.AllowedAttributes', $attributes);

$htmlDef = $config->getHTMLDefinition(true);

$img = $htmlDef->addBlankElement('img');
$img->attr_transform_pre[] = new ImgSrcTransform();

$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_pre[] = new LinkHrefTransform();

$purifier = new HTMLPurifier($config);

Allow classes with a certain prefix in HTMLPurifier

Using the example from the link in the question as a base, I managed to hack together this solution. It seems to work like a charm. It is limited to a per-element basis, but that's actually great for my purposes.

I don't know if this is the best solution, but it works for me.

class CustomClassDef extends HTMLPurifier_AttrDef {
private $classes, $prefixes ;

public function __construct($classes, $prefixes) {
$this->classes = $classes ;
$this->prefixes = is_array($prefixes) ? join('|', $prefixes) : $prefixes ;
}

public function validate($string, $config, $context) {
$classes = preg_split('/\s+/', $string) ;
$validclasses = array() ;

foreach ($classes as $class) {
if (in_array($class, $this->classes) or
preg_match("/^({$this->prefixes})/i", $class)) {

$validclasses[] = $class ;
}
}

return join(' ', $validclasses) ;
}
}

$config = HTMLPurifier_Config::createDefault() ;
// Allow no classes by default
$config->set('Attr.AllowedClasses', array()) ;

$def = $config->getHTMLDefinition(true) ;
// Allow the class 'fa', and classes prefixed 'fa-' or 'foo-', on i tags
$def->addAttribute('i', 'class', new CustomClassDef(array('fa'), array('fa-', 'foo-'))) ;

// Allow classes prefixed 'language-' on code tags
$def->addAttribute('code', 'class', new CustomClassDef(array(), 'language-')) ;

$purifier = new HTMLPurifier($config) ;


Related Topics



Leave a reply



Submit