How to Find Children of Nodes Using Beautifulsoup

How to find children of nodes using BeautifulSoup

Try this

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print(child)

How to get all direct children of a BeautifulSoup Tag?

You can set the recursive argument to False if you want to select only direct descendants.

An example with the html you provided:

from bs4 import BeautifulSoup

html = "<div class='body'><span>A</span><span><span>B</span></span><span>C</span></div>"
soup = BeautifulSoup(html, "lxml") 
for j in soup.div.find_all(recursive=False):
    print(j)

<span>A</span>
<span><span>B</span></span>
<span>C</span>

beautifulsoup finding all children under certain child

You can use .find_all(recursive=False) with list slice:

from bs4 import BeautifulSoup

html_doc = """
<ul>
    <li class = "list_item_1">item 1</li>
    <li class = "list_item_2">item 2</li>
    <li class = "list_item_3">item 3</li>
    <li class = "list_item_4">item 4</li>

</ul>
"""

soup = BeautifulSoup(html_doc, "html.parser")

print(soup.ul.find_all(recursive=False)[2:])

Prints:

[<li class="list_item_3">item 3</li>, <li class="list_item_4">item 4</li>]

Or if you're open to using .select, you can use CSS selector with ~:

print(soup.select(".list_item_2 ~ *"))

Prints:

[<li class="list_item_3">item 3</li>, <li class="list_item_4">item 4</li>]

Beautiful Soup find children for particular div

It is useful to know that whatever elements BeautifulSoup finds within one element still have the same type as that parent element - that is, various methods can be called.

So this is somewhat working code for your example:

soup = BeautifulSoup(html)
divTag = soup.find_all("div", {"class": "tablebox"})

for tag in divTag:
    tdTags = tag.find_all("td", {"class": "align-right"})
    for tag in tdTags:
        print tag.text

This will print all the text of all the td tags with the class of "align-right" that have a parent div with the class of "tablebox".

BeautifulSoup: Extracting Value from Children nodes

You could try something like this. It basically does what you did above - first iterates through all section-classed td's and then iterates through all span text within. This prints out the class, just in case you needed to be more restrictive:

In [1]: from bs4 import BeautifulSoup

In [2]: html = # Your html here

In [3]: soup = BeautifulSoup(html)

In [4]: for td in soup.find_all('td', {'class': 'section'}):
   ...:     for span in td.find_all('span'):
   ...:         print span.attrs['class'], span.text
   ...:         
['username'] xxUsername
['comment'] 
A test comment

Or with a more-convoluted-than-necessary one-liner that will store everything back in your list:

In [5]: results = [span.text for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span')]

In [6]: results
Out[6]: [u'xxUsername', u'\nA test comment\n']

Or on that same theme, a dictionary with the keys being a tuple of the classes and the values being the text itself:

In [8]: results = dict((tuple(span.attrs['class']), span.text) for td in soup.find_all('td', {'class': 'section'}) for span in td.find_all('span'))

In [9]: results
Out[9]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}

Assuming this one is bit closer to what you want, I would suggest rewriting as:

In [10]: results = {}

In [11]: for td in soup.find_all('td', {'class': 'section'}):
   ....:     for span in td.find_all('span'):
   ....:         results[tuple(span.attrs['class'])] = span.text
   ....:         

In [12]: results
Out[12]: {('comment',): u'\nA test comment\n', ('username',): u'xxUsername'}

BeautifulSoup children of div

Use response.content instead of response.text.

you're also not requesting the correct url in your code. https://www.sailogy.com/en/search/?search_where=ibiza&trip_date=2020-06-06&weeks_count=1&skipper=False&search_src=home only displays a single boat hence you're code is only returning one row.

Use https://www.sailogy.com/en/search/?search_where=ibiza&trip_date=2020-06-06&weeks_count=1&guests_count=&order_by=-rank&is_roundtrip=&coupon_code=&skipper=None instead in this case

You'll probally find use in adjusting the url parameters to filter boats at some point !

Find elements which have a specific child with BeautifulSoup

There are multiple ways to approach the problem.

One option is to locate the Email div by text and get the next sibling:

soup.find("div", text="Email").next_sibling.strip()  # prints "info@blah.com"

BeautifulSoup finding children with only 'dot', without 'find()' function

What you ask is well documented here: BS: navigating using tag names

The simplest way to navigate the parse tree is to say the name of the tag you want. If you want the <head> tag, just say soup.head.
You can do use this trick again and again to zoom in on a certain part of the parse tree. soup.body.b gets the first <b> tag beneath the <body> tag.
Using a tag name as an attribute will give you only the first tag by that name.
If you need to get all the <a> tags, or anything more complicated than the first tag with a certain name, you’ll need to use one of the methods described in Searching the tree, such as find_all()
(emphasis and omissions mine)

So your page_soup.div.div finds the first ever div thats inside a div - and page_soup.div finds the first ever div.

<html>

<head>
  <title>The Dormouse's story</title>
</head>

<body>
  <div>first div</div>
  <p>unrelated
  </p>
  <div>second div
    <div>with another div inside</div>
  </div>

  <div>can't get this one by soup.div.div
    <div>with another div inside</div>
  </div>
</body

BeautifulSoup: Classify parent and children element

Why use soup.find when you can use soup.select, get help from all the CSS wiz kids and test your criteria in a browser first?

There's a performance benchmark on SO and select is faster, or at least not significantly slower, so that's not it. Habit, I guess.

(works just as well without the <p> tag qualifier, i.e. just "[itemprop=name]")

found = soup.select("p[itemprop=name]")

results = dict()

for node in found:

    itemtype = node.parent.attrs.get("itemtype", "?")
    itemtype = itemtype.split("/")[-1]
    results[itemtype] = node.text

print(results)

output:

It is what you asked for, but if many nodes existed with FoodEstablishment, last would win, because you are using a dictionary. A defaultdict with a list might work better, for you to judge.

{'PostalAddress': '33 San Francisco', 'FoodEstablishment': "The Dormouse's story"}

step 1, before Python: rock that CSS!

Sample Image

and if you need to check higher up ancestors for `itemtype`:

it would help if you had html with that happening:

    <div class="address" itemtype="http://schema.org/PostalAddress">
      <div>
        <p itemprop="name">33 San Francisco</p>  
      </div>

    </div>

found = soup.select("[itemprop=name]")

results = dict()

for node in found:

    itemtype = None
    parent = node.parent
    while itemtype is None and parent is not None:
      itemtype = parent.attrs.get("itemtype")
      if itemtype is None:
        parent = parent.parent

    itemtype = itemtype or "?"
    itemtype = itemtype.split("/")[-1]
    results[itemtype] = node.text

print(results)

same output.

using a defautdict

everything stays the same except for declaring the results and putting data into it.

from collections import defaultdict
...
results = defaultdict(list)
...

results[itemtype].append(node.text)

output (after I added a sibling to 33 San Francisco):

defaultdict(<class 'list'>, {'PostalAddress': ['33 San Francisco', '34 LA'], 'FoodEstablishment': ["The Dormouse's story"]})

How to Find Children of Nodes Using Beautifulsoup