How to use regexp to not match HTML tags that have certain tags inside them?
You should do this with XPath:
// Our HTML source
var s = `<a href="tel:something">something1</a>
<a href="tel:[some_numbers]"><span class="hello">Hello1</span>[some_numbers]</a>
<a href="tel:something">something2</a>
<a href="tel:[some_numbers]"><span class="hello">Hello2</span>[some_numbers]</a>
<a href="tel:something">something3</a>
<a href="tel:[some_numbers]"><span class="hello">Hello3</span>[some_numbers]</a>`;
// Create a root div because XML requires a single root element
var div = document.createElement('div');
// Set the innerHTML to our string
div.innerHTML = s;
// Find <a> tags with no direct child <span> tag(s)
var iterator = document.evaluate('//a[not(span)]', div, null, XPathResult.ANY_TYPE, null);
// Set the iterator
var thisNode = iterator.iterateNext();
// Loop the iterator and log the node found
while (thisNode) {
console.log(thisNode);
thisNode = iterator.iterateNext();
}
https://jsfiddle.net/kad3ouqL/
This should yield:
<a href="tel:something">something1</a>
<a href="tel:something">something2</a>
<a href="tel:something">something3</a>
avoid in between character in regex pattern
You need to match any char but >
or an attribute (a chunk of word chars) followed with =
and then a substring between curly braces one or more times with (?:\w+=\{[^{}]*\}|[^>])*
.
Also, you should keep in mind Visual Studio Code regex engine requires {
and }
outside of a character class to be escaped.
The pattern will look like
<Button(?:\w+=\{[^{}]*\}|[^>])*\sclassName=(?:\w+=\{[^{}]*\}|[^>])*>
See the regex demo.
Details
<Button
- a literal string(?:\w+=\{[^{}]*\}|[^>])*
- zero or more repetitions of
\w+=\{[^{}]*\}
- one or more letters, digits or underscores, ={
, zero or more chars other than {
and }
and then a }
|
- or[^>]
- any char other than >
\s
- a whitespaceclassName=
- a literal text(?:\w+=\{[^{}]*\}|[^>])*
- see above>
- a >
char.
Find all the HTML tags via regex (except br and li tags)
The old adage has it, if you want to solve your problem with Regex, you are going to end up with two problems. While admittedly a powerful tool, in situations like this, Regex should be used only as a last resort.
Try the below:
const getAllNodesExceptBrAndLi = htmlString => {
const template = document.createElement('template');
template.innerHTML = htmlString;
const allNodes = template.content.querySelectorAll('*');
return [...allNodes].filter(node => node.tagName !== 'BR' && node.tagName !== 'LI');
};
Related Topics