Get text between tags using javascript
While a regular expression could work for this, it might be easier to simply select the <span class='amount'>
elements and map their innerHTML
content to an array via the map()
function:
// This would yield an array containing your values
var amounts = Array.prototype.slice
.call(document.querySelectorAll('span.amount'))
.map(function(a){ return a.innerHTML; });
You can see a working example of this demonstrated here. Regex select all text between tags
You can use "<pre>(.*?)</pre>"
, (replacing pre with whatever text you want) and extract the first group (for more specific instructions specify a language) but this assumes the simplistic notion that you have very simple and valid HTML.
As other commenters have suggested, if you're doing something complex, use a HTML parser.
Extract text between html elements
Using map()
function you can get all the text in p
like following.
var cities = $('.cities p').map(function () { return $(this).text();}).get().join();
$('.show').html(cities)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><div class="cities"> <div><p>Los Angeles</p><h5>Description</h5></div> <div><p>San Francisco</p><h5>Description</h5></div> <div><p>San Diego</p><h5>Description</h5></div> <div><p>Santa Barbara</p><h5>Description</h5></div> <div><p>Davis</p><h5>Description</h5></div> <div><p>San Jose</p><h5>Description</h5></div></div>
<h3>All cities</h3><div class="show"></div>
how to get text between two SETS of tags in python
YOu could use the .next_sibling
from each of those elements.
Code:
html = '''
<b>Doc Type: </b>AABB
<br />
<b>Doc No: </b>BBBBF
<br />
<b>System No: </b>aaa bbb
<br />
<b>VCode: </b>040000033
<br />
<b>G Code: </b>000045
<br />'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
bs = soup.find_all('b')
for each in bs:
eachFollowingText = each.next_sibling.strip()
print(f'{each.text} {eachFollowingText}')
Output:Doc Type: AABB
Doc No: BBBBF
System No: aaa bbb
VCode: 040000033
G Code: 000045
Get text between several tags
You need to iterate span
elements and get its text.
//Get the span elementsvar spans = document.querySelectorAll("#roles span");var roles = [];
//Iterate the elementsfor (var i = 0; i < spans.length; i++) { //fetch textContent and push it to array roles.push(spans[i].textContent);}
console.log(roles)
<div style="display:none" id="roles"> <span>Manager</span> <span>Seller</span></div>
Efficient way to extract text from between tags
You can use Beautiful Soup that is very good for this kind of task. It is very straightforward, easy to install and with a large documentation.
Your example has some li tags not closed. I already made the corrections and this is how would be to get all the li tags
from bs4 import BeautifulSoup
var = '''<li> <a href="/...html">Energy</a></li>
<ul>
<li><a href="/...html">Coal</a></li>
<li><a href="/...html">Oil </a></li>
<li><a href="/...html">Carbon</a></li>
<li><a href="/...html">Oxygen</a></li>'''
soup = BeautifulSoup(var)
for a in soup.find_all('a'):
print a.string
It will print:For documentation and more examples see the BeautifulSoup docEnergy
Coa
Oil
Carbon
Oxygen
XPath : How to get text between 2 html tags with same level?
This is what worked for me :
For this keep in mind that I'm using scrapy with python-2.7 :
name_query = u"//*[name()=name(//*[@id='"+id+"'])]"
all = response.xpath(name_query)
for selector in all.getall():
if self.id in selector:
position = all.getall().index(selector)
balise = "h" + all.getall()[position].split("<h")[1][0]
title = all.getall()[position].split(">")[1].split("<")[0]
query = u"//*[preceding-sibling::"+balise+"[1] ='"+title+"' and following-sibling::"+balise+"]"
self.log('query = '+query)
results = response.xpath(query)
results.pop(len(results)-1)
with open(filename,'wb') as f:
for text in results.css("::text").getall():
f.write(text.encode('utf-8')+"\n")
This should work in general I tested it against multiple headers wih different levels it works fine for me. Get text between two different html tags python beautifulsoup
See below (using XML parsing)
import xml.etree.ElementTree as ET
xml = '''
<dtposted>
2020
<trnamt>
10
<fitid>
202010
<name>RESTAURANT</name>
</fitid>
</trnamt>
</dtposted>'''
root = ET.fromstring(xml)
print(root.text.strip())
print(root.find('.//trnamt').text.strip())
print(root.find('.//fitid').text.strip())
print(root.find('.//name').text.strip())
output2020
10
202010
RESTAURANT
Related Topics
Symfony 3.1.5 Warning: Sessionhandler::Read(): Session Data File Is Not Created by Your Uid
The Behaviour of the or Operator in PHP
Using PHP & Curl to Login to My Websites Form
Difference Between Directoryiterator and Filesystemiterator
PHP - Display a 404 Error Without Redirecting to Another Page
Speeding Up Large Numbers of MySQL Updates and Inserts
PHP Readfile() and Large Files
Maximum Function Nesting Level of '100' Reached, Aborting After Upgrading to Laravel 5.1
A Better Way to Replace Emoticons in PHP
PHP Exec() Command: How to Specify Working Directory
Laravel: How to Use Multiple Pivot Table Relationships
How to Check Is Timezone Identifier Valid from Code
Calling a PHP Function from an HTML Form in the Same File
Datetime Now PHP MySQL (+ Pdo Variant)
Check If Two PHP Datetime Objects Are Set to the Same Date ( Ignoring Time )