Python: BeautifulSoup extract string between div tag by its class
for row in soup.find_all('div',attrs={"class" : "reviewText"}):
print row.text
or:
[row.text for row in rows]
Python - Beautiful Soup - extract text between <div> and <sup>
I have resolved it. The problem was in the JavaScript-generated data. So static parsing methods don't work with it. I tried several solutions (including Selenium and an XHR script results capturing).
Finally, inside my parsed data I have found a static URL of a page that links to a separate web page, where this JavaScript code is executed and can be parsed by static methods.
The video "Python Web Scraping Tutorial: scraping dynamic JavaScript/Ajax websites with Beautiful Soup" explains a similar solution.
How to extract text from an HTML div tag file with BeautifulSoup?
According to the BeautifulSoup documentation, find_all
returns a list of elements. As such, you're calling price.get_text()
on a list, which is causing the error since that method is only possessed by individual element instances.
AttributeError: ResultSet object has no attribute 'get_text'. You're
probably treating a list of elements like a single element. Did you
call find_all() when you meant to call find()?
The error message hints that you want to be calling this method on a single element rather than a collection.
If we print out your price
variable, we get the following:
<div class="_30jeq3 _1_WHN1">₹15,499</div>
<div class="_30jeq3 _1_WHN1">₹13,499</div>
<div class="_30jeq3 _1_WHN1">₹15,499</div>
...
Assuming you want a list of the text inside each div, simply perform list comprehension on your results:
price_elements = soup.find_all("div",class_="_30jeq3 _1_WHN1")
prices_text = [p.get_text() for p in price_elements]
This will give you the following list
['₹15,499', '₹13,499', '₹15,499', '₹13,499', '₹19,999', '₹29,999', '₹29,999', '₹7,499', '₹7,499', '₹9,999', '₹8,999', '₹7,999', '₹7,999', '₹9,999', '₹8,999', '₹16,999', '₹16,999', '₹14,999', '₹14,999', '₹11,999', '₹8,999', '₹8,999', '₹12,999', '₹11,999']
How to extract text from inside div tag using BeautifulSoup
In edited question data load from javascript
and you need library like selenium
and you can't get data with BeautifulSoup
.
This answer for old question:
If you have multiple class="subPrice"
, you can use find_all()
and get price with .text
like below:
from bs4 import BeautifulSoup
html="""
<div class="nowPrice">
<div class="showPrice" style="color: rgb(14, 203, 129);">47,864.58</div>
<div class="subPrice">$47,864.58</div>
<div class="subPrice">$57,864.58</div>
<div class="subPrice">$67,864.58</div>
<div class="subPrice">$77,864.58</div>
</div>
"""
soup=BeautifulSoup(html,"html.parser")
for sp in soup.find_all("div",class_="subPrice"):
print(sp.text)
output:
$47,864.58
$57,864.58
$67,864.58
$77,864.58
Extract text from within div tag using BeautifulSoup 4 in Python
Expanding the answer from @so1989 as you are also wondering how to print with the format you have specified, I would suggest this approach:
from bs4 import BeautifulSoup
openFile = open("C:\\example.html")
readFile = openFile.read()
soup = BeautifulSoup(readFile, "lxml")
alt = soup.find("div", {"class":"VWP1058422499"}).get("alt").split()
for i, char in enumerate(alt):
if char == '-':
alt[i-2] = alt[i-2] + '\n'
if char[0] in ['-', 'C', 'L', 'o']:
alt[i] = ' ' + alt[i]
alt = ''.join(alt)
print(alt)
Extracting text from multiple DIVS + DIV Styles with Python/BeautifulSoup
Use findAll
to extracts a list of Tag objects that match the given criteria, then zip
to iterate overt the iterable in parallel.
from bs4 import BeautifulSoup
input_ = """<section id="content4" class="tab-content">
<p>
<div class="Text_Title">Product 1</div>
<div style="display: inline-block;">Red Ball<div></p>
<p>
<div class="Text_Title">Product 2</div>
<div style="display: inline-block;">Green Ball</div></p>
<p>
<div class="Text_Title">Product 3</div>
<div style="display: inline-block;">Yellow Ball</div></p>"""
soup = BeautifulSoup(input_, "html.parser")
for x, y in zip(soup.findAll("div", attrs={"class": "Text_Title"}),
soup.findAll("div", attrs={"style": "display: inline-block;"})):
print(x.text, "-", y.text)
Product 1 - Red Ball
Product 2 - Green Ball
Product 3 - Yellow Ball
Related Topics
Selecting Specific Rows of CSV Based on a Column'S Value in Python
Tensorflow:Attributeerror: 'Module' Object Has No Attribute 'Mul'
How to Flatten a Hierarchical Index in Columns
Pandas: Sum Dataframe Rows for Given Columns
Python Ssl.Sslerror: [Ssl: Certificate_Verify_Failed] Certificate Verify Failed (_Ssl.C:748)
Python Number With 1000 Separator
How to Find the Maximum Consecutive Occurrences of a Number in Python
Get Rid of Columns With Null Value in Json Output
Using Pyserial to Send Binary Data
Tf.Data.Dataset: How to Get the Dataset Size (Number of Elements in an Epoch)
Defining and Calling a Function Within a Python Class
Create an Array With a Pre Determined Mean and Standard Deviation
Matplotlib Rotate Image File by X Degrees
How to Increment a Variable on a for Loop in Jinja Template
Missing 1 Required Positional Argument - Issue
Python - Using Regex to Find Multiple Matches and Print Them Out
A Better Way Than Looping and Calling Functions That Loop and Call Another Functions