How to find tags with only certain attributes - BeautifulSoup
As explained on the BeautifulSoup documentation
You may use this :
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
EDIT :
To return tags that have only the valign="top" attribute, you can check for the length of the tag attrs
property :
from BeautifulSoup import BeautifulSoup
html = '<td valign="top">.....</td>\
<td width="580" valign="top">.......</td>\
<td>.....</td>'
soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})
for result in results :
if len(result.attrs) == 1 :
print result
That returns :
<td valign="top">.....</td>
BeautifulSoup: find all tags with a certain attribute, not value
Your question already has an answer on SO. I just wanted to answer for a case where it should either be empty or a pattern
from bs4 import BeautifulSoup
import re
html="""
<div>
<p data="123"></p>
<p data="567"></p>
<p data=""></p>
</div>
"""
soup = BeautifulSoup(html,'lxml')
# get all tags with that attribute
p_list=soup.findAll("p", data=True)
print(p_list)
# get all tags with attribute value either empty or a particular pattern
p_list=soup.findAll("p", {"data":re.compile("^$|123")})
print(p_list)
Output
[<p data="123"></p>, <p data="567"></p>, <p data=""></p>]
[<p data="123"></p>, <p data=""></p>]
Beautiful Soup. How to find tags with specific attribute but different attribute values in one search?
This depends on what version of BeautifulSoup you are using. Looking at the docs for bs3 it looks like what you are looking for is something like the following
soup.findAll(class=['post_wrap', 'post_wrap__staff']}
As you tagged it with Python 3 I assume you are using bs4. The docs state that you can do something like the below:
soup.find_all("div", attrs={"class": ["post_wrap", "post_wrap__staff"]})
But as noted in a similar question about multiple attributes, it might be better to use CSS selectors like the below:
result = soup.find_all("div", class_=["post_wrap", "post_wrap__staff"])
Beautiful Soup find all values for a given attribute, without specifying the tag
Use an attribute selector.
titles = [item['title'] for item in soup.select('[title]')]
BeautifulSoup: find all tags with a given attribute
You can use a filter function:
parser.find_all(lambda tag: tag is not None and tag.has_attr("data-path"))
How to select tags by attribute value with Beautiful Soup
html = """
<div class="headercolumn">
<h2>
<a class="results" data-name="result-name" href="/xxy> my text</a>
</h2>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for d in soup.findAll("div",{"class":"headercolumn"}):
print d.a.get("data-name")
print d.select("a.results")
result-name
[<a class="results" data-name="result-name" href="/xxy> my text</a></h2>"></a>]
Beautifulsoup, find the only tag in the htm that has no attribute
You can pass a lambda function to the find_all
method that checks the tag name and that there are no attrs within the element:
soup.find_all(lambda tag: tag.name == 'div' and not tag.attrs)
How to find all elements with a custom html attribute regardless of html tag using Beautiful Soup?
# First case:
soup.find_all(attrs={"limit":True})
# Second case:
soup.find_all("div", attrs={"limit":True})
Reference:
- http://www.crummy.com/software/BeautifulSoup/bs4/doc/#kwargs
- http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all
If your attribute name doesn't collide with either Python keywords or soup.find_all
named args, the syntax is simpler:
soup.find_all(id=True)
Python Beautifulsoup : how to find a tag by attribute value without knowing corresponding attribute name?
One solution is using lambda
in find_all
function.
Example:
data = '''<a href="xyz">a</a>
<div class="somethingelse">b</div>
<div class="xyz">c</div>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
for tag in soup.find_all(lambda tag: any('xyz' in tag[a] for a in tag.attrs)):
print(tag)
Prints:
<a href="xyz">a</a>
<div class="xyz">c</div>
Related Topics
Why Does Appending to One List Also Append to All Other Lists in My List of Lists
Are Sets Ordered Like Dicts in Python3.6
Same Output in Different Workers in Multiprocessing
Can Multiprocessing Process Class Be Run from Idle
Python Worker Failed to Connect Back
Reading File Opened with Python Paramiko Sftpclient.Open Method Is Slow
How to Access a Standard-Library Module in Python When There Is a Local Module with the Same Name
Numpy.Where() Detailed, Step-By-Step Explanation/Examples
Why Should I Close Files in Python
Suppress Insecurerequestwarning: Unverified Https Request Is Being Made in Python2.6
Accessing the List While Being Sorted
Plot a Bar Using Matplotlib Using a Dictionary
How to Create a Spinning Command Line Cursor
Is the += Operator Thread-Safe in Python
Scaling of Tkinter Gui in 4K (3840*2160) Resolution