Remove HTML Tags in String

How to strip HTML tags from string in JavaScript?

Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:

Your HTML is valid within a <div> element. HTML contained within <body> or <html> or <head> tags is not valid within a <div> and may therefore not be parsed correctly.
textContent (the DOM standard property) and innerText (non-standard) properties are not identical. For example, textContent will include text within a <script> element while innerText will not (in most browsers). This only affects IE <=8, which is the only major browser not to support textContent.
The HTML does not contain <script> elements.
The HTML is not null
The HTML comes from a trusted source. Using this with arbitrary HTML allows arbitrary untrusted JavaScript to be executed. This example is from a comment by Mike Samuel on the duplicate question: <img onerror='alert(\"could run arbitrary JS here\")' src=bogus>

Code:

var html = "<p>Some HTML</p>";
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";

Remove HTML tags from a String

Use a HTML parser instead of regex. This is dead simple with Jsoup.

public static String html2text(String html) {
    return Jsoup.parse(html).text();
}

Jsoup also supports removing HTML tags against a customizable whitelist, which is very useful if you want to allow only e.g. <b>, <i> and <u>.

How can I remove HTML tags other than div and span from a string in JavaScript?

To remove all tags excluding specific tags, you can use the following regular expression.

const str = "<div><p></p><span></span></div>";
console.log(str.replace(/(<\/?(?:span|div)[^>]*>)|<[^>]+>/ig, '$1'));

How to remove html tags from an Html string using RegEx?

You can use

.replace(/<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi, (x,y) => y ? ' & ' : '')

See the JavaScript demo:

const text = '<div class="ExternalClassBE95E28C1751447DB985774141C7FE9C"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>';
const regex = /<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi;
console.log(
  text.replace(regex, (x,y) => y ? ' & ' : '')
);

How to remove HTML tag (not a specific tag ) with content from a string in javascript

Removing all HTML tags and the innerText can be done with the following snippet. The Regexp captures the opening tag's name, then matches all content between the opening and closing tags, then uses the captured tag name to match the closing tag.

const regexForStripHTML = /<([^</> ]+)[^<>]*?>[^<>]*?<\/\1> */gi;
const text = "OCEP <sup>®</sup> water product";
const stripContent = text.replaceAll(regexForStripHTML, '');
console.log(text);
console.log(stripContent);

Strip HTML from Text JavaScript

If you're running in a browser, then the easiest way is just to let the browser do it for you...

function stripHtml(html)
{
   let tmp = document.createElement("DIV");
   tmp.innerHTML = html;
   return tmp.textContent || tmp.innerText || "";
}

Note: as folks have noted in the comments, this is best avoided if you don't control the source of the HTML (for example, don't run this on anything that could've come from user input). For those scenarios, you can still let the browser do the work for you - see Saba's answer on using the now widely-available DOMParser.

How do I remove HTML tags from a list of strings that contain the same HTML tags?

You can create a for-loop and call .get_text() from it:

import requests
from bs4 import BeautifulSoup

URL = "https://www.ebay.com/sch/i.html?_from=R40&_nkw=oneplus%206t&_sacat=0&rt=nc&_udlo=150&_udhi=450"
headers = {"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')

for price in soup.findAll("span", {"class": "s-item__price"}):
    print(price.get_text(strip=True))

Prints:

$449.99
$449.99
$414.46
$399.00
$399.95
$349.99
$449.00
$585.00
...and son on.

EDIT: To print title and price, you could do for example:

for tag in soup.select('li.s-item:has(.s-item__title):has(.s-item__price)'):
    print('{: <10} {}'.format(tag.select_one('.s-item__price').get_text(strip=True),
                              tag.select_one('.s-item__title').get_text(strip=True, separator=' ')))

Prints:

$449.99    SPONSORED OnePlus 6T 128GB 8GB RAM A6010 - Midnight Black (Unlocked) Global Version
$449.99    OnePlus 6T 128GB 8GB RAM A6010 - Midnight Black (Unlocked) Global Version
$414.46    Oneplus 6t dual sim 256gb midnight black black 6.41" unlocked ram 8gb a6010
$399.00    SPONSORED OnePlus 6T A6013, Clean ESN, Unknown Carrier, Coffee
$399.95    SPONSORED OnePlus 6T 4G LTE 6.41" 128GB ROM 8GB RAM A6013 (T-Mobile)  - Mirror Black
$349.99    ONEPLUS 6T - BLACK - 128GB - (T-MOBILE) ~3841
$449.00    OnePlus 6t McLaren Edition Unlocked 256GB 10GB RAM Original Accessories Included
$434.83    OnePlus 6T 8 GB RAM 128 GB UK SIM-Free Smartphone (ML3658)
$265.74    Oneplus 6t
$241.58    New Listing OnePlus 6T 8GB 128GB UNLOCKED
$419.95    NEW IN BOX Oneplus 6T  128GB  Mirror Black (T-mobile/Metro PCS/Mint) 8gb RAM
$435.99    OnePlus 6T - 128GB 6GB RAM - Mirror Black (Unlocked) Global Version

... and so on.

How to remove all html tags including ' ' from string?

The text looks to be double-escaped, kinda - first turn all the &s into &s, so that the HTML entities can be properly recognized. Then .text() will give you the plain text version of the HTML markup.

const input = `<p>Lorem Ipsum&nbsp;is simply dummy text of the printing and typesetting industry.Lorem Ipsum has been the industry&#39;s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting,remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>\n\n<p> </p>\n\n<p>TItle </p>\n`;
const inputWithProperEntities = input.replaceAll('&', '&');
console.log($(inputWithProperEntities).text());

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

How do I remove all HTML tags from a string without knowing which tags are in it?

You can use a simple regex like this:

public static string StripHTML(string input)
{
   return Regex.Replace(input, "<.*?>", String.Empty);
}

Be aware that this solution has its own flaw. See Remove HTML tags in String for more information (especially the comments of 'Mark E. Haase'/@mehaase)

Another solution would be to use the HTML Agility Pack.

You can find an example using the library here: HTML agility pack - removing unwanted tags without removing content?