Regex select all text between tags
You can use "<pre>(.*?)</pre>"
, (replacing pre with whatever text you want) and extract the first group (for more specific instructions specify a language) but this assumes the simplistic notion that you have very simple and valid HTML.
As other commenters have suggested, if you're doing something complex, use a HTML parser.
Regex that extracts text between tags, but not the tags
You can use this following Regex:
>([^<]*)<
or, >[^<]*<
Then eliminate unwanted characters like '<' & '>'
Regex match text between tags
/<b>(.*?)<\/b>/g
Add g
(global) flag after:
/<b>(.*?)<\/b>/g.exec(str)
//^-----here it is
However if you want to get all matched elements, then you need something like this:
var str = "<b>Bob</b>, I'm <b>20</b> years old, I like <b>programming</b>.";
var result = str.match(/<b>(.*?)<\/b>/g).map(function(val){
return val.replace(/<\/?b>/g,'');
});
//result -> ["Bob", "20", "programming"]
If an element has attributes, regexp will be:
/<b [^>]+>(.*?)<\/b>/g.exec(str)
Select text between 2 complete span tags using regex
Regex is not good way to find HTML tags. But this should work for you-
<\s*span[^>]*>(.*?)<\s*\/\s*span>
DEMO: https://regex101.com/r/vbLN9L/6
Regex to extract pure text within specific HTML tag
How real software engineers solve this problem: Use the right tool for the right job, i.e. don't use regexes to parse HTML
The most straightforward way is to use an HTML parsing library, since parsing even purely conforming XML with regex is extremely non-trivial, and handling all HTML edge cases is an inhumanly difficult task.
If your requirements are "you must use a regex library to pull innerHTML from a
<p>
element", I'd much prefer to split it into two tasks: 1) using regex to pull out the container element with its innerHTML. (I'm showing an example that only works for getting the outermost element of a known tag. To extract an arbitrary nested item you'd have to use some trick like https://blogs.msdn.microsoft.com/bclteam/2005/03/15/net-regular-expressions-regex-and-balanced-matching-ryan-byington/ to match the balanced expression)
2) using a simple Regex.Replace to strip out all tag content
let html = @"<p>This is some <strong>strong</strong> text</p>
<p>This is some <b><em>really<strong>strong</strong><em></b> text</p>"
for m in Regex.Matches(html, @"<p>(.*?)</p>") do
printfn "(%O)" (Regex.Replace(m.Groups.[1].Value, "<.*?>", ""))
(This is some strong text)
(This is some reallystrong text)
If you are constrained to a single "Regex.Matches" call, and you're okay with ignoring the possibility of nested <p>
tags (as luck would have it, in conformant HTML you can't nest p
s but this solution wouldn't work for a containing element like <div>
) you should be able to do it with a nongreedy matching of a text part and a tag part wrapped up inside a <p>...</p>
pattern. (Note 1: this is F#, but it should be trivial to convert to C#) (Note 2: This relies on .NET-flavored regex-isms like stackable group names and multiple captures per group)
let rx = @"
<p>
(?<p_text>
(?:
(?<text>[^<>]+)
(?:<.*?>)+
)*?
(?<text>[^<>]+)?
)</p>
"
let regex = new Regex(rx, RegexOptions.IgnorePatternWhitespace)
for m in regex.Matches(@"
<p>This is some <strong>strong</strong> text</p>
<p>This is some <b><em>really<strong>strong</strong><em></b> text</p>
") do
printfn "p content: %O" m
for capture in m.Groups.["text"].Captures do
printfn "text: %O" capture
p content: <p>This is some <strong>strong</strong> text</p>
text: This is some
text: strong
text: text
p content: <p>This is some <b><em>really<strong>strong</strong><em></b> text</p>
text: This is some
text: really
text: strong
text: text
Remember that both the above examples don't work that well on malformed HTML or cases where the same tag is nested in itsel
Regex - How can I select the text between some HTML tags right after a specific tag?
Use gm
regex flags with the following regex pattern:
^<dt\s+class="prd_name">\s*<strong>\K.*?(?=<\/strong>)
https://regex101.com/r/fakRAE/1
Regex just keep the content between tags but select everything
An idea is to match what you don't want but capture what you need to \1
<script>[\s\S]*?<\/script>|((?:<(?!script)|[^<])[\s\S]*?)(?=<script|$)
See this demo at regex101
To not skip over an opening <script
in the alternation either match a character, that is not <
or match a <
which is not followed by script
by use of a lookahead until <script
occurs or $
end.
RegEx for matching between any two HTML tags
A RegEx for that a string between any two HTML tags
(?![^<>]*>)(TEST\-TEXT)
Related Topics
Browsers Try to Download HTML File Instead of Opening
How to Make Div Have 100% Height of Parent, Independent of Children'S Size? Complex Layout
Make First Column Fixed and Next Column Scrollable in HTML Table
Div Elements Overlapping Each Other
Splitting the HTML Page Using Div
Make Container Shrink-To-Fit Child Elements as They Wrap
A Div With Auto Resize When Changing Window Width\Height
How to Style a ≪Select≫ Dropdown With Only Css
Css Technique For a Horizontal Line With Words in the Middle
Fixed Page Header Overlaps In-Page Anchors
Placing Two Divs on Top of Each Other Without Using Position:Absolute
How to Display Image Encoded in Base64 Format in Angular 6
How to Add HTML Line Break Within an Input Text Placeholder
How to Declare a Variable in a Template in Angular
How to Append Querystrings to a Url on Submit of a Form
Angular:Failed to Load Images: 404 (Not Found)
How to Use a :Before or :After Pseudo-Element on an Input Field