How to Find Tag with Particular Text with Beautiful Soup

BeautifulSoup - search by text inside a tag

The problem is that your <a> tag with the <i> tag inside, doesn't have the string attribute you expect it to have. First let's take a look at what text="" argument for find() does.

NOTE: The text argument is an old name, since BeautifulSoup 4.4.0 it's called string.

From the docs:

Although string is for finding strings, you can combine it with
arguments that find tags: Beautiful Soup will find all tags whose
.string matches your value for string. This code finds the tags
whose .string is “Elsie”:

soup.find_all("a", string="Elsie")
# [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]

Now let's take a look what Tag's string attribute is (from the docs again):

If a tag has only one child, and that child is a NavigableString, the
child is made available as .string:

title_tag.string
# u'The Dormouse's story'

(...)

If a tag contains more than one thing, then it’s not clear what
.string should refer to, so .string is defined to be None:

print(soup.html.string)
# None

This is exactly your case. Your <a> tag contains a text and <i> tag. Therefore, the find gets None when trying to search for a string and thus it can't match.

How to solve this?

Maybe there is a better solution but I would probably go with something like this:

import re
from bs4 import BeautifulSoup as BS

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
<i class="fa fa-edit"></i> Edit
</a>
""")

links = soup.find_all('a', href="/customer-menu/1/accounts/1/update")

for link in links:
if link.find(text=re.compile("Edit")):
thelink = link
break

print(thelink)

I think there are not too many links pointing to /customer-menu/1/accounts/1/update so it should be fast enough.

How to find tag with particular text with Beautiful Soup?

You can pass a regular expression to the text parameter of findAll, like so:

import BeautifulSoup
import re

columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})

BeautifulSoup find text in specific tag

As you know the exact positions of the tags you want to find, you can use find_all() which returns a list and then get the tag from the required index.

In this case, (19th <tr> and 2nd <td>) use this:

result = soup.find_all('tr')[18].find_all('td')[1].text

How to find tag name given a text in BeautifulSoup

This script will print all tags that share tag name and tag attributes with tag that contains string "456":

txt = '''
<div class='mydiv'>
<p style='xyz'>123</p>
<p>456</p>
<p style='xyz'>789</p>
<p>abc</p>
</div>'''

text_to_find = '456'
soup = BeautifulSoup(txt, 'html.parser')

tmp = soup.find(lambda t: t.contents and t.contents[0] == text_to_find)
if tmp:
for tag in soup.find_all(lambda t: t.name == tmp.name and t.attrs == tmp.attrs):
print(tag)

Prints:

<p>456</p>
<p>abc</p>

For input "123":

<p style="xyz">123</p>
<p style="xyz">789</p>

Searching for a text that contains a particular text using BeautifulSoup

Try this:

from bs4 import BeautifulSoup

html = '''
<td>the keyword is present in the <a href='text' title='text'>text</a> </td>
<td>word key is not present</td>
<td>no keyword here</td>'''

soup = BeautifulSoup(html , 'html.parser')
print(*[td for td in soup.find_all("td") if 'keyword' in td.text], sep='\n')

Output:

<td>the keyword is present in the <a href="text" title="text">text</a> </td>
<td>no keyword here</td>

You can use td.text for get text in <td> like below:

print(*[td.text for td in soup.find_all("td") if 'keyword' in td.text], sep='\n')

Output:

the keyword is present in the text 
no keyword here

Using BeautifulSoup to find a HTML tag that contains certain text

from BeautifulSoup import BeautifulSoup
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)

for elem in soup(text=re.compile(r' #\S{11}')):
print elem.parent

Prints:

<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>

Extract all links after a particular tag using beautifulsoup

Try:

from bs4 import BeautifulSoup

html1 = """<html>
<head></head>
<body>
<p>Hello World!</p>
<a href='whatevs.com'>whatevs</a>
<p>Howdy!</p>
<a href='well.com'>well</a>
<div><span>haha</span><a href='haha.com'>haha</a></div>
<a href='goodbye.com'>Goodbye!</a>
</body>
</html>"""

soup = BeautifulSoup(html1, "html.parser")

out, tag = [], soup.find("p", text="Howdy!")
while True:
tag = tag.find_next("a")
if not tag:
break
out.append(tag.text)

print(out)

Prints:

['well', 'haha', 'Goodbye!']

BeautifulSoup Find tag with text containing  

The non-breaking space is parsed as \xa0, so you can either run:

text = soup.find('strong', text='Hello\xa0')

Or you could use regex:

import re
text = soup.find('strong', text=re.compile("Hello"))

Alternatively you could use a lambda function that looks for Hello at the start of the string:

text = soup.find("strong", text=lambda value: value.startswith("Hello"))


Related Topics



Leave a reply



Submit