Beautifulsoup Webscraping Find_All( ): Finding Exact Match

Beautiful soup, exact match when using findAll()

You should have a look at this issue: BeautifulSoup webscraping find_all( ): finding exact match

The answer seems to be :

descContainer = descContainers[0].find_all(lambda tag: tag.name == 'div' and 
tag.get('class') == ['userHtml'])

BeautifulSoup find_all that 'Kind of match'

I think John Clements answered this here beautifully (pun intended).
https://stackoverflow.com/a/14257743/16068811

So in your case:

items = soup.findAll("div", {"id" : re.compile('uni-item.*')})

or

items = soup.findAll("div", {"id" : lambda L: L and L.startswith('uni-item')})

haven't tried it but should work.

Beautifulsoup: find_all returns empty list and needs an exact matching string

from bs4 import BeautifulSoup
import requests
import re

#Add amazon site to access prices
URL = "https://do-something.de/products/just-do-something-t-shirt"

#get HTML file from URL
Web_Data = requests.get(URL)

#Parse (make correct syntax for the)document
HTML_File = BeautifulSoup(Web_Data.text, "html.parser")

#Make a variable for finding all requested strings

###PROBLEM: I want to find the substring, not the exact string,
# which is in this case "25,00€". I only want it to check the "€" sign
# prices = BeautifulSoup.find_all(HTML_File, text='€') ## replace with
prices = HTML_File.find_all(string= re.compile("€"))
for price in prices:
print(price.text.strip())
print('___________')

THis returns:

___________

___________
100,00€
___________

___________
25,00€
___________
Kostenloser Versand ab 100€ Bestellwert
___________
XXS - 25,00€
___________
XS - 25,00€
___________
S - 25,00€
___________
M - 25,00€
___________
L - 25,00€
___________
XL - 25,00€
___________
XXL - 25,00€
___________
XXXL - 25,00€
___________
Standardversand: 4,90€
___________
Kostenloser Versand ab einem Bestellwert von 100€
___________
13,00€
___________
16,50€
___________
12,00€
___________
16,00€
___________
14,00€
___________
18,00€
___________

Beautifulsoup find_all() captures too much text

Try:

soup.select('div[class="x"]')

Output:

[<div class="x">Address</div>, <div class="x">Phone</div>]


Related Topics



Leave a reply



Submit