Remove a Tag But Keep the Text

Remove a HTML tag but keep the innerHtml

$('b').contents().unwrap();

This selects all <b> elements, then uses .contents() to target the text content of the <b>, then .unwrap() to remove its parent <b> element.


For the greatest performance, always go native:

var b = document.getElementsByTagName('b');

while(b.length) {
var parent = b[ 0 ].parentNode;
while( b[ 0 ].firstChild ) {
parent.insertBefore( b[ 0 ].firstChild, b[ 0 ] );
}
parent.removeChild( b[ 0 ] );
}

This will be much faster than any jQuery solution provided here.

Remove tag if id matches but keep text

If I understand you correctly, you wish to turn this:

<a id="searchword1" class="searchword" style="background-color: yellow; text-decoration: none; color: black;">my text</a>

into this:

my text

If that's the case, then it's very easy.

As it stands, it looks like you're asking for an child of the element you showed (the element doesn't have any children, other than the text-node. I expect your script is hosed by line 2 - when it tries to get a non-existent child)

 //1. Get element containing text

var element = document.getElementById('searchWord1');

//2. Get the text it contains

var highlightedText = element.innerHTML;

//3. Get the highlighted element's parent

var parent = element.parentNode;

//4. Create a text node:

var newNode = document.createTextNode(highlightedText);

//5. Insert it into the document before the link

parent.insertBefore(newNode, element);

//6. Remove the link element

parent.removeChild(element);

XML : remove tag but keep text

You're almost there. To get your output, try:

for d in root.findall(".//dialogue"):
for s in d.findall('.//sentence'):
if s.text:
new_t = s.text.strip()
d.remove(s)
d.text=new_t
print(ET.tostring(root).decode())

And that should output what you need.

Remove a tag but keep the text

Here is what I would do :

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-eot
<a href="/www.somethinggggg.com">Something 123</a>
eot

node = doc.at("a")
node.replace(node.text)

puts doc.to_html

output

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org
/TR/REC-html40/loose.dtd">
<html>
<body>Something 123</body>
</html>

Update

What if I have an array that holds content with links?

Hint

require 'nokogiri'

doc = Nokogiri::HTML.parse <<-eot
<a href="/www.foo.com">foo</a>
<a href="/www.bar.com">bar</a>
<a href="/www.baz.com">baz</a>
eot

arr = %w(foo bar baz)
nodes = doc.search("a")
nodes.each {|node| node.replace(node.content) if arr.include?(node.content) }

puts doc.to_html

output

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org
/TR/REC-html40/loose.dtd">
<html>
<body>foo
bar
baz
</body>
</html>

JS - Remove a tag without deleting content

jQuery has easier ways:

var spans = $('span');
spans.contents().unwrap();

With different selector methods, it is possible to remove deeply nested spans or just direct children spans of an element.

How to remove unwanted HTML tags from user input but keep text inside the tags in PHP using DOMDocument

It seems this problem needs to be broken down into two smaller steps in order to generalize the solution.

First, Walking the DOM Tree

In order to get to a working solution I found I need to have a sensible way to traverse every node in the DOM tree and inspect it in order to determine if it should be kept as-is or modified.

So I used wrote the following method as a simple generator extending from DOMDocument.

class HTMLFixer extends DOMDocument {
public function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}

This way doing something like foreach($dom->walk($dom) as $node) gives me a simple loop to traverse the entire tree. Of course this is a PHP 7 only solution because of the yield from syntax, but I'm OK with that.

Second, Removing Tags but Keeping their Text

The tricky part was figuring out how to keep the text and not the tag while making modifications inside the loop. So after struggling with a few different approaches I found the simplest way was to build a list of tags to be removed from inside the loop and then remove them later using DOMNode::insertBefore() to append the text nodes up the tree. That way removing those nodes later has no side effects.

So I added another generalized stripTags method to this child class for DOMDocument.

public function stripTags(DOMNode $node) {
$change = $remove = [];

/* Walk the entire tree to build a list of things that need removed */
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n); // strips all node attributes not allowed
$this->forceAttributes($n); // forces any required attributes
if (!in_array($n->nodeName, $this->allowedTags, true)) {
// track the disallowed node for removal
$remove[] = $n;
// we take all of its child nodes for modification later
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}

/* Go through the list of changes first so we don't break the
referential integrity of the tree */
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}

/* Now we can safely remove the old nodes */
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}

The trick here is because we use insertBefore, on the child nodes (i.e. text node) of the disallowed tags, to move them up to the parent tag, we could easily break the tree (we're copying). This confused me a lot at first, but looking at the way the method works, it makes sense. Deferring the move of the node makes sure we don't break parentNode reference when the deeper node is the one that's allowed, but its parent is not in the allowed tags list for example.

Complete Solution

Here's the complete solution I came up with to more generally solve this problem. I'll include in my answer since I struggled to find a lot of the edge cases in doing this with DOMDocument elsewhere. It allows you to specify which tags to allow, and all other tags are removed. It also allows you to specify which attributes are allowed and all other attributes can be removed (even forcing certain attributes on certain tags).

class HTMLFixer extends DOMDocument {
protected static $defaultAllowedTags = [
'p',
'h1',
'h2',
'h3',
'h4',
'h5',
'h6',
'pre',
'code',
'blockquote',
'q',
'strong',
'em',
'del',
'img',
'a',
'table',
'thead',
'tbody',
'tfoot',
'tr',
'th',
'td',
'ul',
'ol',
'li',
];
protected static $defaultAllowedAttributes = [
'a' => ['href'],
'img' => ['src'],
'pre' => ['class'],
];
protected static $defaultForceAttributes = [
'a' => ['target' => '_blank'],
];

protected $allowedTags = [];
protected $allowedAttributes = [];
protected $forceAttributes = [];

public function __construct($version = null, $encoding = null, $allowedTags = [],
$allowedAttributes = [], $forceAttributes = []) {
$this->setAllowedTags($allowedTags ?: static::$defaultAllowedTags);
$this->setAllowedAttributes($allowedAttributes ?: static::$defaultAllowedAttributes);
$this->setForceAttributes($forceAttributes ?: static::$defaultForceAttributes);
parent::__construct($version, $encoding);
}

public function setAllowedTags(Array $tags) {
$this->allowedTags = $tags;
}

public function setAllowedAttributes(Array $attributes) {
$this->allowedAttributes = $attributes;
}

public function setForceAttributes(Array $attributes) {
$this->forceAttributes = $attributes;
}

public function getAllowedTags() {
return $this->allowedTags;
}

public function getAllowedAttributes() {
return $this->allowedAttributes;
}

public function getForceAttributes() {
return $this->forceAttributes;
}

public function saveHTML(DOMNode $node = null) {
if (!$node) {
$node = $this;
}
$this->stripTags($node);
return parent::saveHTML($node);
}

protected function stripTags(DOMNode $node) {
$change = $remove = [];
foreach($this->walk($node) as $n) {
if ($n instanceof DOMText || $n instanceof DOMDocument) {
continue;
}
$this->stripAttributes($n);
$this->forceAttributes($n);
if (!in_array($n->nodeName, $this->allowedTags, true)) {
$remove[] = $n;
foreach($n->childNodes as $child) {
$change[] = [$child, $n];
}
}
}
foreach($change as list($a, $b)) {
$b->parentNode->insertBefore($a, $b);
}
foreach($remove as $a) {
if ($a->parentNode) {
$a->parentNode->removeChild($a);
}
}
}

protected function stripAttributes(DOMNode $node) {
$attributes = $node->attributes;
$len = $attributes->length;
for ($i = $len - 1; $i >= 0; $i--) {
$attr = $attributes->item($i);
if (!isset($this->allowedAttributes[$node->nodeName]) ||
!in_array($attr->name, $this->allowedAttributes[$node->nodeName], true)) {
$node->removeAttributeNode($attr);
}
}
}

protected function forceAttributes(DOMNode $node) {
if (isset($this->forceAttributes[$node->nodeName])) {
foreach ($this->forceAttributes[$node->nodeName] as $attribute => $value) {
$node->setAttribute($attribute, $value);
}
}
}

protected function walk(DOMNode $node, $skipParent = false) {
if (!$skipParent) {
yield $node;
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $n) {
yield from $this->walk($n);
}
}
}
}

So if we have the following HTML

<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>

And we only want to allow <p>, and <em>.

$html = <<<'HTML'
<div id="content">
Some text...
<p class="someclass">Hello <span style="color: purple;">P<em>H</em>P</span>!</p>
</div>
HTML;

$dom = new HTMLFixer(null, null, ['p', 'em']);
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

echo $dom->saveHTML($dom);

We'd get something like this...


Some text...
<p>Hello P<em>H</em>P!</p>

Since you can limit this to a specific subtree in the DOM as well the solution could be generalized even more.

removing tag name but keep tag

getElementsByTagName() returns a live NodeList. So when you replace a tag, the indexes of all the following elements shift down and the code fails when you have more than one <strong> tag in the same paragraph. As a result, it will skip some tags.

The solution is to convert the NodeList to an array so it doesn't change while you're looping.

Another problem in your real page that isn't in the snippet is that the <strong> tags can be nested deeply within the <p>. You should use strongs[j].parentElement to get its direct parent, rather than assuming that the p[i] is the parent.

var p = document.getElementsByTagName("p");for (var i = 0; i < p.length; i++) {  var strongs = Array.from(p[i].getElementsByTagName("strong"));  for (var j = 0; j < strongs.length; j++) {    strongs[j].parentElement.replaceChild(document.createTextNode(strongs[j].innerText), strongs[j]);  }}
<html>
<body> <p>aaa <Strong>bbbbb</Strong> - <strong>12345</strong></p> <p>acccaa <span><Strong>ddddd</Strong> x</span></p> <p>eeee <Strong>ffff</Strong> </p>
</body>
</html>

Remove anchor tags from a paragraph but keep the text using Javascript

Try this:

var yourHtml= `This is a <a href="http://www.link1">link</a> and so is <a href="http://www.link1">this</a>. This is also another <a href="http://www.link2">boring link</a>.`;

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(yourHtml, 'text/html');

var text = htmlDoc.body.innerText;

console.log(text); // Returns: "This is a link and so is this. This is also another boring link."

This converts your HTML string into DOM, and uses .innerText to remove all html elements from your string - leaving only the text.

Update:

Created this simple function that returns text, and only requires the HTML string:

function textFromHTML(str) {
var parser = new DOMParser();
var htmlDoc = parser.parseFromString(str, 'text/html');
return htmlDoc.body.innerText;
}

/* --- Usage --- */

var yourHtml= `This is a <a href="http://www.link1">link</a> and so is <a href="http://www.link1">this</a>. This is also another <a href="http://www.link2">boring link</a>.`;
var text = textFromHTML(yourHtml);

console.log(text); // Returns text

Update 2 (RegEx):

Final version, but uses RegExp instead of the DOMParser():

function textFromHTML(str) {
return str.replace(new RegExp("<.*?>", "g"), "");
}

/* --- Usage --- */

var text = textFromHTML("Hello <span>World!</span> This string is HTML!");

console.log(text); // Returns: "Hello World! This string is HTML!"

remove tag but keep string between tag in php

Try this

$content = "
<script>
[random] string 1
</script>

<script>
[random] string 2
</script>
....
<script>
[random] string n
</script>
";

$content = str_replace(array("<script>", "</script>"), "", $content);

EDIT:
Since you want to get rid of <script></script> and in the same time keep <script type="text/javascript"></script> and because using regexp to solve this kind of problems is a bad idea then try to use the DOMDocument like this:

$dom = new DOMDocument();

$content = "
<script>
[random] string 1
</script>

<script>
[random] string 2
</script>
....
<script>
[random] string n
</script>

<script type='text/javascript'>
must keeping script
</script>

<script type='text/javascript'>
must keeping script
</script>
";

$dom->loadHTML($content);
$scripts = $dom->getElementsByTagName('script');

foreach ($scripts as $script) {
if (!$script->hasAttributes()) {
echo $script->nodeValue . "<br>";
}
}

This will output:

[random] string 1

[random] string 2

[random] string n

Remove Html tags and formatting but keep anchor tag in string

I have used following way to remove all tags except anchor tag.

  value = value.replace(/<script[^>]*>([\S\s]*?)<\/script>/gmi, '');
let html = '', addToMax = false, extraLimit = 0;

if(value && typeof value === 'string' && value.length) {
let inDelete = false;

for(let i = 0; i < value.length; i++) {
if(value.charAt(i) == '<' && value.charAt(i + 1) && value.charAt(i + 1) != 'a') {
inDelete = true;
if(value.charAt(i + 1) == '/' && ((value.charAt(i + 2) && value.charAt(i + 2) == 'a') || (value.charAt(i + 2) == ' ') && value.charAt(i + 3) && value.charAt(i + 3) == 'a')) {
inDelete = false;
}
}

if(!inDelete) {
html += value.charAt(i);
}

if(inDelete && value.charAt(i) == '>') {
inDelete = false;
}
}
}
value = angular.copy(html);


Related Topics



Leave a reply



Submit