Strip HTML Tags and Its Contents

Strip HTML tags and its contents

Try removing the spans directly from the DOM tree.

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;

$elements = $dom->getElementsByTagName('span');
while($span = $elements->item(0)) {
$span->parentNode->removeChild($span);
}

echo $dom->saveHTML();

How to remove HTML tag (not a specific tag ) with content from a string in javascript

Removing all HTML tags and the innerText can be done with the following snippet. The Regexp captures the opening tag's name, then matches all content between the opening and closing tags, then uses the captured tag name to match the closing tag.

const regexForStripHTML = /<([^</> ]+)[^<>]*?>[^<>]*?<\/\1> */gi;
const text = "OCEP <sup>®</sup> water product";
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(text);
console.log(stripContent);

Strip HTML from Text JavaScript

If you're running in a browser, then the easiest way is just to let the browser do it for you...

function stripHtml(html)
{
let tmp = document.createElement("DIV");
tmp.innerHTML = html;
return tmp.textContent || tmp.innerText || "";
}

Note: as folks have noted in the comments, this is best avoided if you don't control the source of the HTML (for example, don't run this on anything that could've come from user input). For those scenarios, you can still let the browser do the work for you - see Saba's answer on using the now widely-available DOMParser.

How to strip HTML tags from div content using Javascript/jQuery?

use the regular expression.

var regex = /(<([^>]+)>)/ig
var body = "<p>test</p>"
var result = body.replace(regex, "");

alert(result);

HERE IS THE DEMO

Hope this helps.

Remove specific HTML tag with its content from javascript string

You should avoid parsing HTML using regex. Here is a way of removing all the <a> tags using DOM:

// your HTML textvar myString = '<table><tr><td>Some text ...<a href="#">label...</a></td></tr></table>';myString += '<table><tr><td>Some text ...<a href="#">label...</a></td></tr></table>'myString += '<table><tr><td>Some text ...<a href="#">label...</a></td></tr></table>'
// create a new dov containervar div = document.createElement('div');
// assing your HTML to div's innerHTMLdiv.innerHTML = myString;
// get all <a> elements from divvar elements = div.getElementsByTagName('a');
// remove all <a> elementswhile (elements[0]) elements[0].parentNode.removeChild(elements[0])
// get div's innerHTML into a new variablevar repl = div.innerHTML;
// display itconsole.log(repl)
/*<table><tbody><tr><td>Some text ...</td></tr></tbody></table><table><tbody><tr><td>Some text ...</td></tr></tbody></table><table><tbody><tr><td>Some text ...</td></tr></tbody></table>*/

Remove HTML tags from a String

Use a HTML parser instead of regex. This is dead simple with Jsoup.

public static String html2text(String html) {
return Jsoup.parse(html).text();
}

Jsoup also supports removing HTML tags against a customizable whitelist, which is very useful if you want to allow only e.g. <b>, <i> and <u>.

See also:

  • RegEx match open tags except XHTML self-contained tags
  • What are the pros and cons of the leading Java HTML parsers?
  • XSS prevention in JSP/Servlet web application

How do I remove HTML tags from a list of strings that contain the same HTML tags?

You can create a for-loop and call .get_text() from it:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=oneplus%206t&_sacat=0&rt=nc&_udlo=150&_udhi=450"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

for price in soup.findAll("span", {"class": "s-item__price"}):
print(price.get_text(strip=True))

Prints:

$449.99
$449.99
$414.46
$399.00
$399.95
$349.99
$449.00
$585.00
...and son on.

EDIT: To print title and price, you could do for example:

for tag in soup.select('li.s-item:has(.s-item__title):has(.s-item__price)'):
print('{: <10} {}'.format(tag.select_one('.s-item__price').get_text(strip=True),
tag.select_one('.s-item__title').get_text(strip=True, separator=' ')))

Prints:

$449.99    SPONSORED OnePlus 6T 128GB 8GB RAM A6010 - Midnight Black (Unlocked) Global Version
$449.99 OnePlus 6T 128GB 8GB RAM A6010 - Midnight Black (Unlocked) Global Version
$414.46 Oneplus 6t dual sim 256gb midnight black black 6.41" unlocked ram 8gb a6010
$399.00 SPONSORED OnePlus 6T A6013, Clean ESN, Unknown Carrier, Coffee
$399.95 SPONSORED OnePlus 6T 4G LTE 6.41" 128GB ROM 8GB RAM A6013 (T-Mobile) - Mirror Black
$349.99 ONEPLUS 6T - BLACK - 128GB - (T-MOBILE) ~3841
$449.00 OnePlus 6t McLaren Edition Unlocked 256GB 10GB RAM Original Accessories Included
$434.83 OnePlus 6T 8 GB RAM 128 GB UK SIM-Free Smartphone (ML3658)
$265.74 Oneplus 6t
$241.58 New Listing OnePlus 6T 8GB 128GB UNLOCKED
$419.95 NEW IN BOX Oneplus 6T 128GB Mirror Black (T-mobile/Metro PCS/Mint) 8gb RAM
$435.99 OnePlus 6T - 128GB 6GB RAM - Mirror Black (Unlocked) Global Version

... and so on.


Related Topics



Leave a reply



Submit