Using Beautifulsoup to Extract Text from Div

Python: BeautifulSoup extract string between div tag by its class

for row in soup.find_all('div',attrs={"class" : "reviewText"}):
print row.text

or:

[row.text for row in rows]

Python - Beautiful Soup - extract text between <div> and <sup>

I have resolved it. The problem was in the JavaScript-generated data. So static parsing methods don't work with it. I tried several solutions (including Selenium and an XHR script results capturing).

Finally, inside my parsed data I have found a static URL of a page that links to a separate web page, where this JavaScript code is executed and can be parsed by static methods.

The video "Python Web Scraping Tutorial: scraping dynamic JavaScript/Ajax websites with Beautiful Soup" explains a similar solution.

How to extract text from an HTML div tag file with BeautifulSoup?

According to the BeautifulSoup documentation, find_all returns a list of elements. As such, you're calling price.get_text() on a list, which is causing the error since that method is only possessed by individual element instances.

AttributeError: ResultSet object has no attribute 'get_text'. You're
probably treating a list of elements like a single element. Did you
call find_all() when you meant to call find()?

The error message hints that you want to be calling this method on a single element rather than a collection.

If we print out your price variable, we get the following:

<div class="_30jeq3 _1_WHN1">₹15,499</div>
<div class="_30jeq3 _1_WHN1">₹13,499</div>
<div class="_30jeq3 _1_WHN1">₹15,499</div>
...

Assuming you want a list of the text inside each div, simply perform list comprehension on your results:

price_elements = soup.find_all("div",class_="_30jeq3 _1_WHN1")
prices_text = [p.get_text() for p in price_elements]

This will give you the following list

['₹15,499', '₹13,499', '₹15,499', '₹13,499', '₹19,999', '₹29,999', '₹29,999', '₹7,499', '₹7,499', '₹9,999', '₹8,999', '₹7,999', '₹7,999', '₹9,999', '₹8,999', '₹16,999', '₹16,999', '₹14,999', '₹14,999', '₹11,999', '₹8,999', '₹8,999', '₹12,999', '₹11,999']

How to extract text from inside div tag using BeautifulSoup

In edited question data load from javascript and you need library like selenium and you can't get data with BeautifulSoup.

This answer for old question:

If you have multiple class="subPrice", you can use find_all() and get price with .text like below:

from bs4 import BeautifulSoup

html="""
<div class="nowPrice">
<div class="showPrice" style="color: rgb(14, 203, 129);">47,864.58</div>
<div class="subPrice">$47,864.58</div>
<div class="subPrice">$57,864.58</div>
<div class="subPrice">$67,864.58</div>
<div class="subPrice">$77,864.58</div>
</div>
"""
soup=BeautifulSoup(html,"html.parser")
for sp in soup.find_all("div",class_="subPrice"):
print(sp.text)

output:

$47,864.58
$57,864.58
$67,864.58
$77,864.58

Extract text from within div tag using BeautifulSoup 4 in Python

Expanding the answer from @so1989 as you are also wondering how to print with the format you have specified, I would suggest this approach:

from bs4 import BeautifulSoup

openFile = open("C:\\example.html")
readFile = openFile.read()

soup = BeautifulSoup(readFile, "lxml")
alt = soup.find("div", {"class":"VWP1058422499"}).get("alt").split()

for i, char in enumerate(alt):
if char == '-':
alt[i-2] = alt[i-2] + '\n'
if char[0] in ['-', 'C', 'L', 'o']:
alt[i] = ' ' + alt[i]

alt = ''.join(alt)
print(alt)

Extracting text from multiple DIVS + DIV Styles with Python/BeautifulSoup

Use findAll to extracts a list of Tag objects that match the given criteria, then zip to iterate overt the iterable in parallel.

from bs4 import BeautifulSoup

input_ = """<section id="content4" class="tab-content">
<p>
<div class="Text_Title">Product 1</div>
<div style="display: inline-block;">Red Ball<div></p>
<p>
<div class="Text_Title">Product 2</div>
<div style="display: inline-block;">Green Ball</div></p>
<p>
<div class="Text_Title">Product 3</div>
<div style="display: inline-block;">Yellow Ball</div></p>"""

soup = BeautifulSoup(input_, "html.parser")

for x, y in zip(soup.findAll("div", attrs={"class": "Text_Title"}),
soup.findAll("div", attrs={"style": "display: inline-block;"})):
print(x.text, "-", y.text)

Product 1 - Red Ball
Product 2 - Green Ball
Product 3 - Yellow Ball


Related Topics



Leave a reply



Submit