HTML Purifier: Removing an element conditionally based on its attributes
Success! Thanks to Ambush Commander and mcgrailm in another question, I am now using a hilariously simple solution:
// a bit of context
$htmlDef = $this->configuration->getHTMLDefinition(true);
$anchor = $htmlDef->addBlankElement('a');
// HTMLPurifier_AttrTransform_RemoveLoneHttp strips 'href="http:/"' from
// all anchor tags (see first post for class detail)
$anchor->attr_transform_post[] = new HTMLPurifier_AttrTransform_RemoveLoneHttp();
// this is the magic! We're making 'href' a required attribute (note the
// asterisk) - now HTML Purifier removes <a></a>, as well as
// <a href="http:/"></a> after HTMLPurifier_AttrTransform_RemoveLoneHttp
// is through with it!
$htmlDef->addAttribute('a', 'href*', new HTMLPurifier_AttrDef_URI());
It works, it works, bahahahaHAHAHAHAnhͥͤͫ̀ğͮ͑̆ͦó̓̉ͬ͋h́ͧ̆̈́̉ğ̈́͐̈a̾̈́̑ͨô̔̄̑̇g̀̄h̘̝͊̐ͩͥ̋ͤ͛g̦̣̙̙̒̀ͥ̐̔ͅo̤̣hg͓̈́͋̇̓́̆a͖̩̯̥͕͂̈̐ͮ̒o̶ͬ̽̀̍ͮ̾ͮ͢҉̩͉̘͓̙̦̩̹͍̹̠̕g̵̡͔̙͉̱̠̙̩͚͑ͥ̎̓͛̋͗̍̽͋͑̈́̚...! * manic laughter, gurgling noises, keels over with a smile on her face *
HTMLPurifier removing a name=#someanchorname/a - how to stop this from happening?
The Attr.EnableID rule removes html id attributes by default. (And it looks like name attributes as well.)
http://htmlpurifier.org/live/configdoc/plain.html#HTML.EnableAttrID
Why it happens is explained here, http://htmlpurifier.org/docs/enduser-id.html.
Modify all links in HTML Purifier
Actually I found partial solution on one of the links on the forum.
This is what I need to do:
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
So the full thing looks like this:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$purifier = new HTMLPurifier($config);
Whitelist element with class of, using htmlpurifier
Ok so based on Ambush-comander's suggestion I was able to remove all spans that did not have a specific class the idea is that if the class it required then it the element doesn't have that class the element will be removed.
I did some research and found htmlpurifier customize page which explains how to add an attribute following their instructions i only need an additonal four lines of code so here is what how I did it
// more configuration stuff up here
$config->set('HTML.DefinitionID', 'enduser-customize.html editor');
$config->set('HTML.DefinitionRev', 1);
$def = $config->getHTMLDefinition(true);
$def->addAttribute('span', 'class*', new HTMLPurifier_AttrDef_Enum(
array('allowed')
));
// purify down here
the * in class makes the class requried and becuse we only allow the "allowed" class everything else gets striped.
now, there is one caveats to doing it this way. if someone put that class in there span then it would be allowed in my case I'm not really using "allowed" I'm using something else that will be replaced by html purifier
hth someone else
and thanks to ambush and pinkgothic for all their help!
htmlpurifier allow scheme for specific tags
Here's my solution;
Filters;
class ImgSrcTransform extends HTMLPurifier_AttrTransform
{
protected $parse;
public function __construct(){
$this->parser = new HTMLPurifier_URIParser();
}
public function transform($attr, $config, $context)
{
if(!isset($attr['src'])){
return $attr;
}
$url = $this->parser->parse($attr['src']);
if($url->scheme == 'http' || $url->scheme == 'https'){
unset($attr['src']);
}
return $attr;
}
}
class LinkHrefTransform extends HTMLPurifier_AttrTransform
{
protected $parse;
public function __construct(){
$this->parser = new HTMLPurifier_URIParser();
}
public function transform($attr, $config, $context)
{
if(!isset($attr['href'])){
return $attr;
}
$url = $this->parser->parse($attr['href']);
if($url->scheme == 'data'){
unset($attr['href']);
}
return $attr;
}
}
Using the filters;
$config = HTMLPurifier_Config::createDefault();
$config->set('URI.AllowedSchemes', array('data' => true, 'http' => true, 'https' => true));
$config->set('HTML.AllowedElements', $elements);
$config->set('HTML.AllowedAttributes', $attributes);
$htmlDef = $config->getHTMLDefinition(true);
$img = $htmlDef->addBlankElement('img');
$img->attr_transform_pre[] = new ImgSrcTransform();
$anchor = $htmlDef->addBlankElement('a');
$anchor->attr_transform_pre[] = new LinkHrefTransform();
$purifier = new HTMLPurifier($config);
Allow classes with a certain prefix in HTMLPurifier
Using the example from the link in the question as a base, I managed to hack together this solution. It seems to work like a charm. It is limited to a per-element basis, but that's actually great for my purposes.
I don't know if this is the best solution, but it works for me.
class CustomClassDef extends HTMLPurifier_AttrDef {
private $classes, $prefixes ;
public function __construct($classes, $prefixes) {
$this->classes = $classes ;
$this->prefixes = is_array($prefixes) ? join('|', $prefixes) : $prefixes ;
}
public function validate($string, $config, $context) {
$classes = preg_split('/\s+/', $string) ;
$validclasses = array() ;
foreach ($classes as $class) {
if (in_array($class, $this->classes) or
preg_match("/^({$this->prefixes})/i", $class)) {
$validclasses[] = $class ;
}
}
return join(' ', $validclasses) ;
}
}
$config = HTMLPurifier_Config::createDefault() ;
// Allow no classes by default
$config->set('Attr.AllowedClasses', array()) ;
$def = $config->getHTMLDefinition(true) ;
// Allow the class 'fa', and classes prefixed 'fa-' or 'foo-', on i tags
$def->addAttribute('i', 'class', new CustomClassDef(array('fa'), array('fa-', 'foo-'))) ;
// Allow classes prefixed 'language-' on code tags
$def->addAttribute('code', 'class', new CustomClassDef(array(), 'language-')) ;
$purifier = new HTMLPurifier($config) ;
Related Topics
How to Update Timezonedb in PHP (Updating Timezones Info)
Setting Variables on Constructor VS on the Class Definition
How to Keep All the Post Information While Redirecting in PHP
How to Add a View Helper Directory (Zend Framework)
PHP Pthreads: Fatal Error: Class 'Thread' Not Found
PHP $_Server['Remote_Addr'] Shows Ipv6
How to Recursively Obtain the "Parent Id" of Rows in This MySQL Table
Parse Error: Syntax Error, Unexpected '.', Expecting ',' or ';'
Reply-To Address in PHP Contact Form
$_Post Empty on Utf-8 Characters
PHP MySQL - When Is the Best Time to Disconnect from the Database
Simple PHP Mail Function Not Working on Amazon Server Ec2
Woocommerce - Overriding Billing State and Post Code on Existing Checkout Fields