A regular expression to remove a given (x)HTML tag from a string
Attempting to parse HTML with regular expressions is generally an extremely bad idea. Use a parser instead, there should be one available for your chosen language.
You might be able to get away with something like this:
</?tag[^>]*?>
But it depends on exactly what you're doing. For example, that won't remove the tag's content, and it may leave your HTML in an invalid state, depending on which tag you're trying to remove. It also copes badly with invalid HTML (and there's a lot of that about).
Use a parser instead :)
How to remove html tags from an Html string using RegEx?
You can use
.replace(/<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi, (x,y) => y ? ' & ' : '')
See the JavaScript demo:
const text = '<div class="ExternalClassBE95E28C1751447DB985774141C7FE9C"><p>Tina Schmelz<br></p><p>Sascha Balke<br></p></div>';
const regex = /<br>(?=(?:\s*<[^>]*>)*$)|(<br>)|<[^>]*>/gi;
console.log(
text.replace(regex, (x,y) => y ? ' & ' : '')
);
Regular expression to remove HTML tags without br/ tab from a string
</?([a-z]+)>
should do. If slash is after letters it will not match.
Regular expression to remove HTML tags
Using a regular expression to parse HTML is fraught with pitfalls. HTML is not a regular language and hence can't be 100% correctly parsed with a regex. This is just one of many problems you will run into. The best approach is to use an HTML / XML parser to do this for you.
Here is a link to a blog post I wrote awhile back which goes into more details about this problem.
- http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx
That being said, here's a solution that should fix this particular problem. It in no way is a perfect solution though.
var pattern = @"<(img|a)[^>]*>(?<content>[^<]*)<";
var regex = new Regex(pattern);
var m = regex.Match(sSummary);
if ( m.Success ) {
sResult = m.Groups["content"].Value;
How can i remove HTML Tags from String by REGEX?
try this
// erase html tags from a string
public static string StripHtml(string target)
{
//Regular expression for html tags
Regex StripHTMLExpression = new Regex("<\\S[^><]*>", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);
return StripHTMLExpression.Replace(target, string.Empty);
}
call
string htmlString="<div><span>hello world!</span></div>";
string strippedString=StripHtml(htmlString);
Removing certain HTML tags using Regex
function stripHTML(html) {
return str.replace(/<(\/?|\!?)(DOCTYPE html|html|head|body)>/g, "");
}
You need a global modifier to get all cases
http://regex101.com/r/aA1vL0
How to use regex to remove string within certain HTML tag and string must contain empty space
Using regex with HTML is fraught with various issues, that is why you should be aware of all possible consequences. So, your <code>.+?</code>
regex will only work in case the <code>
and </code>
tags are on one line and if there are no nested <code>
tags inside them.
Assuming there are no nested code
tags you might extend your current approach:
import re
inputString = "I want to remove <code>tag with space</code> not sole <code>word</code>"
outputString = re.sub("<code>(.+?)</code>", lambda m: " " if " " in m.group(1) else m.group(), inputString, flags=re.S)
print(outputString)
The re.S
flag will enable .
to match line breaks and a lambda will help to perform a check against each match: any code tag that contains a whitespace in its node value will be turned into a regular space, else it will be kept.
See this Python demo
A more common way to parse HTML in Python is to use BeautifulSoup. First, parse the HTML, then get all the code
tags and then replace the code
tag if the nodes contains a space:
>>> from bs4 import BeautifulSoup
soup = BeautifulSoup('I want to remove <code>tag with space</code> not sole <code>word</code>', "html.parser")
>>> for p in soup.find_all('code'):
if p.string and " " in p.string:
p.replace_with(" ")
>>> print(soup)
I want to remove not sole <code>word</code>
Related Topics
Assign Variables to Child Template in {% Include %} Tag Django
Text-Overflow: Ellipsis in Table-Cell Without Width
How to Apply CSS and Styling to a React Component
HTML Input Range Step as an Array of Values
Typescript: Problems with Type System
Remove Microsoft Edge's Phone Number Styling
Overlapping CSS Flexbox Items in Safari
Making a Div Fit The Initial Screen
Aligning Elements Left, Center and Right in Flexbox
How to Include Glyphicons in Bootstrap 3
CSS Flexbox | Reordering Elements in Mobile
How to Break Word After Special Character Like Hyphens (-)