BeautifulSoup - search by text inside a tag
The problem is that your <a>
tag with the <i>
tag inside, doesn't have the string
attribute you expect it to have. First let's take a look at what text=""
argument for find()
does.
NOTE: The text
argument is an old name, since BeautifulSoup 4.4.0 it's called string
.
From the docs:
Although string is for finding strings, you can combine it with
arguments that find tags: Beautiful Soup will find all tags whose
.string matches your value for string. This code finds the tags
whose .string is “Elsie”:soup.find_all("a", string="Elsie")
# [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]
Now let's take a look what Tag
's string
attribute is (from the docs again):
If a tag has only one child, and that child is a NavigableString, the
child is made available as .string:title_tag.string
# u'The Dormouse's story'
(...)
If a tag contains more than one thing, then it’s not clear what
.string should refer to, so .string is defined to be None:print(soup.html.string)
# None
This is exactly your case. Your <a>
tag contains a text and <i>
tag. Therefore, the find gets None
when trying to search for a string and thus it can't match.
How to solve this?
Maybe there is a better solution but I would probably go with something like this:
import re
from bs4 import BeautifulSoup as BS
soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
<i class="fa fa-edit"></i> Edit
</a>
""")
links = soup.find_all('a', href="/customer-menu/1/accounts/1/update")
for link in links:
if link.find(text=re.compile("Edit")):
thelink = link
break
print(thelink)
I think there are not too many links pointing to /customer-menu/1/accounts/1/update
so it should be fast enough.
How to find tag with particular text with Beautiful Soup?
You can pass a regular expression to the text parameter of findAll
, like so:
import BeautifulSoup
import re
columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})
BeautifulSoup find text in specific tag
As you know the exact positions of the tags you want to find, you can use find_all()
which returns a list and then get the tag from the required index.
In this case, (19th <tr>
and 2nd <td>
) use this:
result = soup.find_all('tr')[18].find_all('td')[1].text
How to find tag name given a text in BeautifulSoup
This script will print all tags that share tag name and tag attributes with tag that contains string "456":
txt = '''
<div class='mydiv'>
<p style='xyz'>123</p>
<p>456</p>
<p style='xyz'>789</p>
<p>abc</p>
</div>'''
text_to_find = '456'
soup = BeautifulSoup(txt, 'html.parser')
tmp = soup.find(lambda t: t.contents and t.contents[0] == text_to_find)
if tmp:
for tag in soup.find_all(lambda t: t.name == tmp.name and t.attrs == tmp.attrs):
print(tag)
Prints:
<p>456</p>
<p>abc</p>
For input "123":
<p style="xyz">123</p>
<p style="xyz">789</p>
Searching for a text that contains a particular text using BeautifulSoup
Try this:
from bs4 import BeautifulSoup
html = '''
<td>the keyword is present in the <a href='text' title='text'>text</a> </td>
<td>word key is not present</td>
<td>no keyword here</td>'''
soup = BeautifulSoup(html , 'html.parser')
print(*[td for td in soup.find_all("td") if 'keyword' in td.text], sep='\n')
Output:
<td>the keyword is present in the <a href="text" title="text">text</a> </td>
<td>no keyword here</td>
You can use td.text
for get text in <td>
like below:
print(*[td.text for td in soup.find_all("td") if 'keyword' in td.text], sep='\n')
Output:
the keyword is present in the text
no keyword here
Using BeautifulSoup to find a HTML tag that contains certain text
from BeautifulSoup import BeautifulSoup
import re
html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""
soup = BeautifulSoup(html_text)
for elem in soup(text=re.compile(r' #\S{11}')):
print elem.parent
Prints:
<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
Extract all links after a particular tag using beautifulsoup
Try:
from bs4 import BeautifulSoup
html1 = """<html>
<head></head>
<body>
<p>Hello World!</p>
<a href='whatevs.com'>whatevs</a>
<p>Howdy!</p>
<a href='well.com'>well</a>
<div><span>haha</span><a href='haha.com'>haha</a></div>
<a href='goodbye.com'>Goodbye!</a>
</body>
</html>"""
soup = BeautifulSoup(html1, "html.parser")
out, tag = [], soup.find("p", text="Howdy!")
while True:
tag = tag.find_next("a")
if not tag:
break
out.append(tag.text)
print(out)
Prints:
['well', 'haha', 'Goodbye!']
BeautifulSoup Find tag with text containing
The non-breaking space is parsed as \xa0
, so you can either run:
text = soup.find('strong', text='Hello\xa0')
Or you could use regex:
import re
text = soup.find('strong', text=re.compile("Hello"))
Alternatively you could use a lambda function that looks for Hello
at the start of the string:
text = soup.find("strong", text=lambda value: value.startswith("Hello"))
Related Topics
How to Access the Query String in Flask Routes
Basic Python Hello World Program Syntax Error
How to Create a Reference to a Variable in Python
Scraping: Ssl: Certificate_Verify_Failed Error for Http://En.Wikipedia.Org
Sftp in Python? (Platform Independent)
Partial Coloring of Text in Matplotlib
How to Make a Custom Activation Function with Only Python in Tensorflow
Reshape Wide to Long in Pandas
Threading in a Pyqt Application: Use Qt Threads or Python Threads
How to Check If a Process Is Still Running Using Python on Linux
Boto3 Client Noregionerror: You Must Specify a Region Error Only Sometimes
Python Requests. 403 Forbidden
Python Functions Call by Reference
Does Python Support Multithreading? Can It Speed Up Execution Time