Get Text Between HTML Tags

Get text between tags using javascript

While a regular expression could work for this, it might be easier to simply select the <span class='amount'> elements and map their innerHTML content to an array via the map() function:

// This would yield an array containing your values
var amounts = Array.prototype.slice
.call(document.querySelectorAll('span.amount'))
.map(function(a){ return a.innerHTML; });

You can see a working example of this demonstrated here.

Regex select all text between tags

You can use "<pre>(.*?)</pre>", (replacing pre with whatever text you want) and extract the first group (for more specific instructions specify a language) but this assumes the simplistic notion that you have very simple and valid HTML.

As other commenters have suggested, if you're doing something complex, use a HTML parser.

Extract text between html elements

Using map() function you can get all the text in p like following.

var cities = $('.cities p').map(function () {    return $(this).text();}).get().join();
$('.show').html(cities)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="cities">    <div><p>Los Angeles</p><h5>Description</h5></div>    <div><p>San Francisco</p><h5>Description</h5></div>    <div><p>San Diego</p><h5>Description</h5></div>    <div><p>Santa Barbara</p><h5>Description</h5></div>    <div><p>Davis</p><h5>Description</h5></div>    <div><p>San Jose</p><h5>Description</h5></div></div> 
<h3>All cities</h3><div class="show"></div>

how to get text between two SETS of tags in python

YOu could use the .next_sibling from each of those elements.

Code:

html = '''
<b>Doc Type: </b>AABB
<br />
<b>Doc No: </b>BBBBF
<br />
<b>System No: </b>aaa bbb
<br />
<b>VCode: </b>040000033
<br />
<b>G Code: </b>000045
<br />'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
bs = soup.find_all('b')

for each in bs:
eachFollowingText = each.next_sibling.strip()
print(f'{each.text} {eachFollowingText}')

Output:

Doc Type:  AABB
Doc No: BBBBF
System No: aaa bbb
VCode: 040000033
G Code: 000045

Get text between several tags

You need to iterate span elements and get its text.

//Get the span elementsvar spans = document.querySelectorAll("#roles span");var roles = [];
//Iterate the elementsfor (var i = 0; i < spans.length; i++) { //fetch textContent and push it to array roles.push(spans[i].textContent);}
console.log(roles)
<div style="display:none" id="roles">  <span>Manager</span>  <span>Seller</span></div>

Efficient way to extract text from between tags

You can use Beautiful Soup that is very good for this kind of task. It is very straightforward, easy to install and with a large documentation.

Your example has some li tags not closed. I already made the corrections and this is how would be to get all the li tags

from bs4 import BeautifulSoup

var = '''<li> <a href="/...html">Energy</a></li>
<ul>
<li><a href="/...html">Coal</a></li>
<li><a href="/...html">Oil </a></li>
<li><a href="/...html">Carbon</a></li>
<li><a href="/...html">Oxygen</a></li>'''

soup = BeautifulSoup(var)

for a in soup.find_all('a'):
print a.string

It will print:

Energy

Coa

Oil

Carbon

Oxygen

For documentation and more examples see the BeautifulSoup doc

XPath : How to get text between 2 html tags with same level?

This is what worked for me :

For this keep in mind that I'm using scrapy with python-2.7 :

name_query = u"//*[name()=name(//*[@id='"+id+"'])]"
all = response.xpath(name_query)
for selector in all.getall():
if self.id in selector:
position = all.getall().index(selector)
balise = "h" + all.getall()[position].split("<h")[1][0]
title = all.getall()[position].split(">")[1].split("<")[0]
query = u"//*[preceding-sibling::"+balise+"[1] ='"+title+"' and following-sibling::"+balise+"]"
self.log('query = '+query)
results = response.xpath(query)
results.pop(len(results)-1)
with open(filename,'wb') as f:
for text in results.css("::text").getall():
f.write(text.encode('utf-8')+"\n")

This should work in general I tested it against multiple headers wih different levels it works fine for me.

Get text between two different html tags python beautifulsoup

See below (using XML parsing)

import xml.etree.ElementTree as ET

xml = '''
<dtposted>
2020
<trnamt>
10
<fitid>
202010
<name>RESTAURANT</name>
</fitid>
</trnamt>
</dtposted>'''

root = ET.fromstring(xml)
print(root.text.strip())
print(root.find('.//trnamt').text.strip())
print(root.find('.//fitid').text.strip())
print(root.find('.//name').text.strip())

output

2020
10
202010
RESTAURANT


Related Topics



Leave a reply



Submit