Difference Between "Findall" and "Find_All" in Beautifulsoup

Difference between findAll and find_all in BeautifulSoup

In BeautifulSoup version 4, the methods are exactly the same; the mixed-case versions (findAll, findAllNext, nextSibling, etc.) have all been renamed to conform to the Python style guide, but the old names are still available to make porting easier. See Method Names for a full list.

In new code, you should use the lowercase versions, so find_all, etc.

In your example however, you are using BeautifulSoup version 3 (discontinued since March 2012, don't use it if you can help it), where only findAll() is available. Unknown attribute names (such as .find_all, which only is available in BeautifulSoup 4) are treated as if you are searching for a tag by that name. There is no <find_all> tag in your document, so None is returned for that.

BeautifulSoup difference between findAll and findChildren

findChildren returns a resultSet just as find_all does, there is no difference in using either method as findChildren is actually find_all, if you look at the link to the source you can see:

 findChildren = find_all  # BS2

It's there for backwards compatibility as is findAll = find_all # BS3

Beautifulsoup : Difference between .find() and .select()

To summarise the comments:

  • select finds multiple instances and returns a list, find finds the first, so they don't do the same thing. select_one would be the equivalent to find.
  • I almost always use css selectors when chaining tags or using tag.classname, if looking for a single element without a class I use find. Essentially it comes down to the use case and personal preference.
  • As far as flexibility goes I think you know the answer, soup.select("div[id=foo] > div > div > div[class=fee] > span > span > a") would look pretty ugly using multiple chained find/find_all calls.
  • The only issue with the css selectors in bs4 is the very limited support, nth-of-type is the only pseudo class implemented and chaining attributes like a[href][src] is also not supported as are many other parts of css selectors. But things like a[href=..]* , a[href^=], a[href$=] etc.. are I think much nicer than find("a", href=re.compile(....)) but again that is personal preference.

For performance we can run some tests, I modified the code from an answer here running on 800+ html files taken from here, is is not exhaustive but should give a clue to the readability of some of the options and the performance:

The modified functions are:

from bs4 import BeautifulSoup
from glob import iglob

def parse_find(soup):
author = soup.find("h4", class_="h12 talk-link__speaker").text
title = soup.find("h4", class_="h9 m5").text
date = soup.find("span", class_="meta__val").text.strip()
soup.find("footer",class_="footer").find_previous("data", {
"class": "talk-transcript__para__time"}).text.split(":")
soup.find_all("span",class_="talk-transcript__fragment")

def parse_select(soup):
author = soup.select_one("h4.h12.talk-link__speaker").text
title = soup.select_one("h4.h9.m5").text
date = soup.select_one("span.meta__val").text.strip()
soup.select_one("footer.footer").find_previous("data", {
"class": "talk-transcript__para__time"}).text
soup.select("span.talk-transcript__fragment")

def test(patt, func):
for html in iglob(patt):
with open(html) as f:
func(BeautifulSoup(f, "lxml")

Now for the timings:

In [7]: from testing import test, parse_find, parse_select

In [8]: timeit test("./talks/*.html",parse_find)
1 loops, best of 3: 51.9 s per loop

In [9]: timeit test("./talks/*.html",parse_select)
1 loops, best of 3: 32.7 s per loop

Like I said not exhaustive but I think we can safely say the css selectors are definitely more efficient.

How to use find() and find_all() in BeautifulSoup?

Instead of find_all() just use find()

find_all() returns list of elements.

v2 = soup.find("meta", {"property": "og:price:amount", "content": True})['content'] 
print("v2 is",v2)

Or you can use Css selctor:

v2 = soup.select_one('meta[property="og:price:amount"][content]')['content']
print("v2 is",v2)

BeautifulSoup, difference between soup() and soup.findAll()?

No, there is no difference between the two.

From the documentation: "If you treat the BeautifulSoup object or a Tag object as though it were a function, then it’s the same as calling find_all() on that object."

Python Beautiful Soup find_all

Traverse through the bs4 element as you do in dictionary.

If you are using find():

soup.find('div', {"class":"stars"}) ['title']

this works since find() returns a single value.

But if you are using find_all(), it returns a list and list[string] is an invalid process.

Therefore, you can create a list of those:

res = []
for i in soup.find_all('div', {"class":"stars"}):
res.append(i['title'])

else, as a one-liner:

res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]

Since you want all titles of the reviews, you need to specify the review container, that is, scrape from:

<div class="review__container">

So the code will be:

review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]

gives:

['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']


Related Topics



Leave a reply



Submit