Remove a HTML tag but keep the innerHtml
$('b').contents().unwrap();
This selects all <b>
elements, then uses .contents()
to target the text content of the <b>
, then .unwrap()
to remove its parent <b>
element.
For the greatest performance, always go native:
var b = document.getElementsByTagName('b');
while(b.length) {
var parent = b[ 0 ].parentNode;
while( b[ 0 ].firstChild ) {
parent.insertBefore( b[ 0 ].firstChild, b[ 0 ] );
}
parent.removeChild( b[ 0 ] );
}
This will be much faster than any jQuery solution provided here.
Remove tag if id matches but keep text
If I understand you correctly, you wish to turn this:
<a id="searchword1" class="searchword" style="background-color: yellow; text-decoration: none; color: black;">my text</a>
into this:
my text
If that's the case, then it's very easy.
As it stands, it looks like you're asking for an child of the element you showed (the element doesn't have any children, other than the text-node. I expect your script is hosed by line 2 - when it tries to get a non-existent child)
//1. Get element containing text
var element = document.getElementById('searchWord1');
//2. Get the text it contains
var highlightedText = element.innerHTML;
//3. Get the highlighted element's parent
var parent = element.parentNode;
//4. Create a text node:
var newNode = document.createTextNode(highlightedText);
//5. Insert it into the document before the link
parent.insertBefore(newNode, element);
//6. Remove the link element
parent.removeChild(element);
XML : remove tag but keep text
You're almost there. To get your output, try:
for d in root.findall(".//dialogue"):
for s in d.findall('.//sentence'):
if s.text:
new_t = s.text.strip()
d.remove(s)
d.text=new_t
print(ET.tostring(root).decode())
And that should output what you need.
Remove a tag but keep the text
Here is what I would do :
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-eot
<a href="/www.somethinggggg.com">Something 123</a>
eot
node = doc.at("a")
node.replace(node.text)
puts doc.to_html
output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org
/TR/REC-html40/loose.dtd">
<html>
<body>Something 123</body>
</html>
Update
What if I have an array that holds content with links?
Hint
require 'nokogiri'
doc = Nokogiri::HTML.parse <<-eot
<a href="/www.foo.com">foo</a>
<a href="/www.bar.com">bar</a>
<a href="/www.baz.com">baz</a>
eot
arr = %w(foo bar baz)
nodes = doc.search("a")
nodes.each {|node| node.replace(node.content) if arr.include?(node.content) }
puts doc.to_html
output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org
/TR/REC-html40/loose.dtd">
<html>
<body>foo
bar
baz
</body>
</html>
JS - Remove a tag without deleting content
jQuery has easier ways:
var spans = $('span');
spans.contents().unwrap();
With different selector methods, it is possible to remove deeply nested spans or just direct children spans of an element.
How to remove unwanted HTML tags from user input but keep text inside the tags in PHP using DOMDocument
It seems this problem needs to be broken down into two smaller steps in order to generalize the solution.
First, Walking the DOM Tree
In order to get to a working solution I found I need to have a sensible way to traverse every node in the DOM tree and inspect it in order to determine if it should be kept as-is or modified.
So I used wrote the following method as a simple generator extending from DOMDocument
.
class HTMLFixer extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
This way doing something like foreach($dom->walk($dom) as $node)
gives me a simple loop to traverse the entire tree. Of course this is a PHP 7 only solution because of the yield from
syntax, but I'm OK with that.
Second, Removing Tags but Keeping their Text
The tricky part was figuring out how to keep the text and not the tag while making modifications inside the loop. So after struggling with a few different approaches I found the simplest way was to build a list of tags to be removed from inside the loop and then remove them later using DOMNode::insertBefore()
to append the text nodes up the tree. That way removing those nodes later has no side effects.
So I added another generalized stripTags
method to this child class for DOMDocument
.
public function stripTags(DOMNode $node) {
$change = $remove = [];
/* Walk the entire tree to build a list of things that need removed */
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n); // strips all node attributes not allowed
$this->forceAttributes($n); // forces any required attributes
if (!in_array($n->nodeName, $this->allowedTags, true)) {
// track the disallowed node for removal
$remove[] = $n;
// we take all of its child nodes for modification later
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}
/* Go through the list of changes first so we don't break the
referential integrity of the tree */
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}
/* Now we can safely remove the old nodes */
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}
The trick here is because we use insertBefore
, on the child nodes (i.e. text node) of the disallowed tags, to move them up to the parent tag, we could easily break the tree (we're copying). This confused me a lot at first, but looking at the way the method works, it makes sense. Deferring the move of the node makes sure we don't break parentNode
reference when the deeper node is the one that's allowed, but its parent is not in the allowed tags list for example.
Complete Solution
Here's the complete solution I came up with to more generally solve this problem. I'll include in my answer since I struggled to find a lot of the edge cases in doing this with DOMDocument elsewhere. It allows you to specify which tags to allow, and all other tags are removed. It also allows you to specify which attributes are allowed and all other attributes can be removed (even forcing certain attributes on certain tags).
class HTMLFixer extends DOMDocument {
protected static $defaultAllowedTags = [
'p',
'h1',
'h2',
'h3',
'h4',
'h5',
'h6',
'pre',
'code',
'blockquote',
'q',
'strong',
'em',
'del',
'img',
'a',
'table',
'thead',
'tbody',
'tfoot',
'tr',
'th',
'td',
'ul',
'ol',
'li',
];
protected static $defaultAllowedAttributes = [
'a' => ['href'],
'img' => ['src'],
'pre' => ['class'],
];
protected static $defaultForceAttributes = [
'a' => ['target' => '_blank'],
];
protected $allowedTags = [];
protected $allowedAttributes = [];
protected $forceAttributes = [];
public function __construct($version = null, $encoding = null, $allowedTags = [],
$allowedAttributes = [], $forceAttributes = []) {
$this->setAllowedTags($allowedTags ?: static::$defaultAllowedTags);
$this->setAllowedAttributes($allowedAttributes ?: static::$defaultAllowedAttributes);
$this->setForceAttributes($forceAttributes ?: static::$defaultForceAttributes);
parent::__construct($version, $encoding);
}
public function setAllowedTags(Array $tags) {
$this->allowedTags = $tags;
}
public function setAllowedAttributes(Array $attributes) {
$this->allowedAttributes = $attributes;
}
public function setForceAttributes(Array $attributes) {
$this->forceAttributes = $attributes;
}
public function getAllowedTags() {
return $this->allowedTags;
}
public function getAllowedAttributes() {
return $this->allowedAttributes;
}
public function getForceAttributes() {
return $this->forceAttributes;
}
public function saveHTML(DOMNode $node = null) {
if (!$node) {
$node = $this;
}
$this->stripTags($node);
return parent::saveHTML($node);
}
protected function stripTags(DOMNode $node) {
$change = $remove = [];
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n);
$this->forceAttributes($n);
if (!in_array($n->nodeName, $this->allowedTags, true)) {
$remove[] = $n;
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}
protected function stripAttributes(DOMNode $node) {
$attributes = $node->attributes;
$len = $attributes->length;
for ($i = $len - 1; $i >= 0; $i--) {
$attr = $attributes->item($i);
if (!isset($this->allowedAttributes[$node->nodeName]) ||
!in_array($attr->name, $this->allowedAttributes[$node->nodeName], true)) {
$node->removeAttributeNode($attr);
}
}
}
protected function forceAttributes(DOMNode $node) {
if (isset($this->forceAttributes[$node->nodeName])) {
foreach ($this->forceAttributes[$node->nodeName] as $attribute => $value) {
$node->setAttribute($attribute, $value);
}
}
}
protected function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}
So if we have the following HTML
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>
And we only want to allow <p>
, and <em>
.
$html = <<<'HTML'
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>
HTML;
$dom = new HTMLFixer(null, null, ['p', 'em']);
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
echo $dom->saveHTML($dom);
We'd get something like this...
Some text...
<p>Hello P<em>H</em>P!</p>
Since you can limit this to a specific subtree in the DOM as well the solution could be generalized even more.
removing tag name but keep tag
getElementsByTagName()
returns a live NodeList. So when you replace a tag, the indexes of all the following elements shift down and the code fails when you have more than one <strong>
tag in the same paragraph. As a result, it will skip some tags.
The solution is to convert the NodeList
to an array so it doesn't change while you're looping.
Another problem in your real page that isn't in the snippet is that the <strong>
tags can be nested deeply within the <p>
. You should use strongs[j].parentElement
to get its direct parent, rather than assuming that the p[i]
is the parent.
var p = document.getElementsByTagName("p");for (var i = 0; i < p.length; i++) { var strongs = Array.from(p[i].getElementsByTagName("strong")); for (var j = 0; j < strongs.length; j++) { strongs[j].parentElement.replaceChild(document.createTextNode(strongs[j].innerText), strongs[j]); }}
<html>
<body> <p>aaa <Strong>bbbbb</Strong> - <strong>12345</strong></p> <p>acccaa <span><Strong>ddddd</Strong> x</span></p> <p>eeee <Strong>ffff</Strong> </p>
</body>
</html>
Remove anchor tags from a paragraph but keep the text using Javascript
Try this:
var yourHtml= `This is a <a href="http://www.link1">link</a> and so is <a href="http://www.link1">this</a>. This is also another <a href="http://www.link2">boring link</a>.`;
var parser = new DOMParser();
var htmlDoc = parser.parseFromString(yourHtml, 'text/html');
var text = htmlDoc.body.innerText;
console.log(text); // Returns: "This is a link and so is this. This is also another boring link."
This converts your HTML string into DOM, and uses .innerText
to remove all html elements from your string - leaving only the text.
Update:
Created this simple function that returns text, and only requires the HTML string:
function textFromHTML(str) {
var parser = new DOMParser();
var htmlDoc = parser.parseFromString(str, 'text/html');
return htmlDoc.body.innerText;
}
/* --- Usage --- */
var yourHtml= `This is a <a href="http://www.link1">link</a> and so is <a href="http://www.link1">this</a>. This is also another <a href="http://www.link2">boring link</a>.`;
var text = textFromHTML(yourHtml);
console.log(text); // Returns text
Update 2 (RegEx):
Final version, but uses RegExp instead of the DOMParser()
:
function textFromHTML(str) {
return str.replace(new RegExp("<.*?>", "g"), "");
}
/* --- Usage --- */
var text = textFromHTML("Hello <span>World!</span> This string is HTML!");
console.log(text); // Returns: "Hello World! This string is HTML!"
remove tag but keep string between tag in php
Try this
$content = "
<script>
[random] string 1
</script>
<script>
[random] string 2
</script>
....
<script>
[random] string n
</script>
";
$content = str_replace(array("<script>", "</script>"), "", $content);
EDIT:
Since you want to get rid of <script></script>
and in the same time keep <script type="text/javascript"></script>
and because using regexp to solve this kind of problems is a bad idea then try to use the DOMDocument
like this:
$dom = new DOMDocument();
$content = "
<script>
[random] string 1
</script>
<script>
[random] string 2
</script>
....
<script>
[random] string n
</script>
<script type='text/javascript'>
must keeping script
</script>
<script type='text/javascript'>
must keeping script
</script>
";
$dom->loadHTML($content);
$scripts = $dom->getElementsByTagName('script');
foreach ($scripts as $script) {
if (!$script->hasAttributes()) {
echo $script->nodeValue . "<br>";
}
}
This will output:
[random] string 1
[random] string 2
[random] string n
Remove Html tags and formatting but keep anchor tag in string
I have used following way to remove all tags except anchor tag.
value = value.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
let html = '', addToMax = false, extraLimit = 0;
if(value && typeof value === 'string' && value.length) {
let inDelete = false;
for(let i = 0; i < value.length; i++) {
if(value.charAt(i) == '<' && value.charAt(i + 1) && value.charAt(i + 1) != 'a') {
inDelete = true;
if(value.charAt(i + 1) == '/' && ((value.charAt(i + 2) && value.charAt(i + 2) == 'a') || (value.charAt(i + 2) == ' ') && value.charAt(i + 3) && value.charAt(i + 3) == 'a')) {
inDelete = false;
}
}
if(!inDelete) {
html += value.charAt(i);
}
if(inDelete && value.charAt(i) == '>') {
inDelete = false;
}
}
}
value = angular.copy(html);
Related Topics
How to Convert a Ruby Bigdecimal to a 2-Decimal Place String
Undefined Method 'Instance' for Capistrano::Configuration:Class
What Does a Single Splat/Asterisk in a Ruby Argument List Mean
Ruby Spreadsheet Row Background Color
Marshal Ruby Hash with Default Proc - Remove the Default Proc
Ruby: Syntax for Defining a Constant Inside a Struct
Conditional Page Caching [Solution: Conditional Fragment Caching]
Instance_Eval's Block Argument(S)- Documented? Purpose
Garbage Collector in Ruby 2.2 Provokes Unexpected Cow
Web Page Scraping Gems/Tools Available in Ruby
Error When Installing Libv8 3.11.8.3
What Are Fast Xml Parsers for Ruby
What's Wrong with the Square and Rectangle Inheritance
Ruby on Rails: Radio Buttons for Collection Select
Comparing Identical Datetime Objects in Ruby -- Why Are These Two Datetime.Now's Not Equal