Beautifulsoup - Search by Text Inside a Tag

BeautifulSoup - search by text inside a tag

The problem is that your <a> tag with the <i> tag inside, doesn't have the string attribute you expect it to have. First let's take a look at what text="" argument for find() does.

NOTE: The text argument is an old name, since BeautifulSoup 4.4.0 it's called string.

From the docs:

Although string is for finding strings, you can combine it with
arguments that find tags: Beautiful Soup will find all tags whose
.string matches your value for string. This code finds the tags
whose .string is “Elsie”:

soup.find_all("a", string="Elsie")
# [<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>]

Now let's take a look what Tag's string attribute is (from the docs again):

If a tag has only one child, and that child is a NavigableString, the
child is made available as .string:

title_tag.string
# u'The Dormouse's story'

(...)

If a tag contains more than one thing, then it’s not clear what
.string should refer to, so .string is defined to be None:

print(soup.html.string)
# None

This is exactly your case. Your <a> tag contains a text and <i> tag. Therefore, the find gets None when trying to search for a string and thus it can't match.

How to solve this?

Maybe there is a better solution but I would probably go with something like this:

import re
from bs4 import BeautifulSoup as BS

soup = BS("""
<a href="/customer-menu/1/accounts/1/update">
<i class="fa fa-edit"></i> Edit
</a>
""")

links = soup.find_all('a', href="/customer-menu/1/accounts/1/update")

for link in links:
if link.find(text=re.compile("Edit")):
thelink = link
break

print(thelink)

I think there are not too many links pointing to /customer-menu/1/accounts/1/update so it should be fast enough.

How to find tag with particular text with Beautiful Soup?

You can pass a regular expression to the text parameter of findAll, like so:

import BeautifulSoup
import re

columns = soup.findAll('td', text = re.compile('your regex here'), attrs = {'class' : 'pos'})

BeautifulSoup Find tag with text containing  

The non-breaking space is parsed as \xa0, so you can either run:

text = soup.find('strong', text='Hello\xa0')

Or you could use regex:

import re
text = soup.find('strong', text=re.compile("Hello"))

Alternatively you could use a lambda function that looks for Hello at the start of the string:

text = soup.find("strong", text=lambda value: value.startswith("Hello"))

Using BeautifulSoup to find a HTML tag that contains certain text

from BeautifulSoup import BeautifulSoup
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)

for elem in soup(text=re.compile(r' #\S{11}')):
print elem.parent

Prints:

<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>

Show text inside the tags BeautifulSoup

To get the text within the tags, there are a couple of approaches,

a) Use the .text attribute of the tag.

cars = soup.find_all('span', attrs={'class': 'listing-row__price'})
for tag in cars:
print(tag.text.strip())

Output

$71,996
$75,831
$71,412
$75,476
....

b) Use get_text()

for tag in cars:
print(tag.get_text().strip())

c) If there is only that string inside the tag, you can use these options also

  • .string
  • .contents[0]
  • next(tag.children)
  • next(tag.strings)
  • next(tag.stripped_strings)

ie.

for tag in cars:
print(tag.string.strip()) #or uncomment any of the below lines
#print(tag.contents[0].strip())
#print(next(tag.children).strip())
#print(next(tag.strings).strip())
#print(next(tag.stripped_strings))

Outputs:

$71,996
$75,831
$71,412
$75,476
$77,001
...

Note:

.text and .string are not the same. If there are other elements in the tag, .string returns the None, while .text will return the text inside the tag.

from bs4 import BeautifulSoup
html="""
<p>hello <b>there</b></p>
"""
soup = BeautifulSoup(html, 'html.parser')
p = soup.find('p')
print(p.string)
print(p.text)

Outputs

None
hello there

Search for text inside a tag using beautifulsoup and returning the text in the tag after it

You can define a function to return the value for the key you enter:

def get_txt(soup, key):
key_tag = soup.find('span', text=key).parent
return key_tag.find_all('span')[1].text

color = get_txt(soup, 'Color')
print('Color: ' + color)
features = get_txt(soup, 'Features')
print('Features: ' + features)

Output:

Color: Slate, mykonos
Features: Camera lens cutout, hard shell, rubberized, port cut-outs, raised edges

I hope this is what you are looking for.

Explanation:

soup.find('span', text=key) returns the <span> tag whose text=key.

.parent returns the parent tag of the current <span> tag.

Example:

When key='Color', soup.find('span', text=key).parent will return

<div class="_JDu">
<span class="_IDu">Color</span>
<span class="_KDu">Slate, mykonos</span>
</div>

Now we've stored this in key_tag. Only thing left is getting the text of second <span>, which is what the line key_tag.find_all('span')[1].text does.

BeautifulSoup find text in specific tag

As you know the exact positions of the tags you want to find, you can use find_all() which returns a list and then get the tag from the required index.

In this case, (19th <tr> and 2nd <td>) use this:

result = soup.find_all('tr')[18].find_all('td')[1].text


Related Topics



Leave a reply



Submit