How to Find Tags with Only Certain Attributes - Beautifulsoup

How to find tags with only certain attributes - BeautifulSoup

As explained on the BeautifulSoup documentation

You may use this :

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

EDIT :

To return tags that have only the valign="top" attribute, you can check for the length of the tag attrs property :

from BeautifulSoup import BeautifulSoup

html = '<td valign="top">.....</td>\
<td width="580" valign="top">.......</td>\
<td>.....</td>'

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

for result in results :
if len(result.attrs) == 1 :
print result

That returns :

<td valign="top">.....</td>

BeautifulSoup: find all tags with a certain attribute, not value

Your question already has an answer on SO. I just wanted to answer for a case where it should either be empty or a pattern

from bs4 import BeautifulSoup
import re
html="""
<div>
<p data="123"></p>
<p data="567"></p>
<p data=""></p>
</div>
"""
soup = BeautifulSoup(html,'lxml')
# get all tags with that attribute
p_list=soup.findAll("p", data=True)
print(p_list)
# get all tags with attribute value either empty or a particular pattern
p_list=soup.findAll("p", {"data":re.compile("^$|123")})
print(p_list)

Output

[<p data="123"></p>, <p data="567"></p>, <p data=""></p>]
[<p data="123"></p>, <p data=""></p>]

Beautiful Soup. How to find tags with specific attribute but different attribute values in one search?

This depends on what version of BeautifulSoup you are using. Looking at the docs for bs3 it looks like what you are looking for is something like the following

soup.findAll(class=['post_wrap', 'post_wrap__staff']}

As you tagged it with Python 3 I assume you are using bs4. The docs state that you can do something like the below:

soup.find_all("div", attrs={"class": ["post_wrap", "post_wrap__staff"]})

But as noted in a similar question about multiple attributes, it might be better to use CSS selectors like the below:

result = soup.find_all("div", class_=["post_wrap", "post_wrap__staff"])

Beautiful Soup find all values for a given attribute, without specifying the tag

Use an attribute selector.

titles = [item['title'] for item in soup.select('[title]')]

BeautifulSoup: find all tags with a given attribute

You can use a filter function:

parser.find_all(lambda tag: tag is not None and tag.has_attr("data-path"))

How to select tags by attribute value with Beautiful Soup

html = """
<div class="headercolumn">
<h2>
<a class="results" data-name="result-name" href="/xxy> my text</a>
</h2>
"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for d in soup.findAll("div",{"class":"headercolumn"}):
print d.a.get("data-name")
print d.select("a.results")

result-name
[<a class="results" data-name="result-name" href="/xxy> my text</a></h2>"></a>]

Beautifulsoup, find the only tag in the htm that has no attribute

You can pass a lambda function to the find_all method that checks the tag name and that there are no attrs within the element:

soup.find_all(lambda tag: tag.name == 'div' and not tag.attrs)

How to find all elements with a custom html attribute regardless of html tag using Beautiful Soup?

# First case:
soup.find_all(attrs={"limit":True})

# Second case:
soup.find_all("div", attrs={"limit":True})

Reference:

  • http://www.crummy.com/software/BeautifulSoup/bs4/doc/#kwargs
  • http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

If your attribute name doesn't collide with either Python keywords or soup.find_all named args, the syntax is simpler:

soup.find_all(id=True)

Python Beautifulsoup : how to find a tag by attribute value without knowing corresponding attribute name?

One solution is using lambda in find_all function.

Example:

data = '''<a href="xyz">a</a>
<div class="somethingelse">b</div>
<div class="xyz">c</div>'''

from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')

for tag in soup.find_all(lambda tag: any('xyz' in tag[a] for a in tag.attrs)):
print(tag)

Prints:

<a href="xyz">a</a>
<div class="xyz">c</div>


Related Topics



Leave a reply



Submit