Only Extracting Text from This Element, Not Its Children

Only extracting text from this element, not its children

what about .find(text=True)?

>>> BeautifulSoup.BeautifulSOAP('<html>yes<b>no</b></html>').find(text=True)
u'yes'
>>> BeautifulSoup.BeautifulSOAP('<html><b>no</b>yes</html>').find(text=True)
u'no'

EDIT:

I think that I've understood what you want now. Try this:

>>> BeautifulSoup.BeautifulSOAP('<html><b>no</b>yes</html>').html.find(text=True, recursive=False)
u'yes'
>>> BeautifulSoup.BeautifulSOAP('<html>yes<b>no</b></html>').html.find(text=True, recursive=False)
u'yes'

Using .text() to retrieve only text not nested in child tags

I liked this reusable implementation based on the clone() method found here to get only the text inside the parent element.

Code provided for easy reference:

$("#foo")
.clone() //clone the element
.children() //select all the children
.remove() //remove all the children
.end() //again go back to selected element
.text();

Beautiful Soup 4: extracting text only from a tag containing children tags

You may be looking for .next_element[docs] attribute which points to immediate afterwards of whatever was grabbed. So, in your case, it will look something like this.

result = soup.find('div', class_='title').next_element.strip()
# -> $ 430000

Extract text from inline children of an anchor tag

You can grab the shallow text using tag.find_all(text=True, recursive=False) as described here, then use a normal selector to pull out the deeper text from the span.

This way, all of the data is separate from the start and you're not dealing with parsing the individual pieces from the smushed text in the parent's view after the fact.

from bs4 import BeautifulSoup

html = """
<a class="reference internal" href="optparse.html">
<code class="xref py py-mod docutils literal notranslate">
<span class="pre">
optparse
</span>
</code>
— Parser for command line options
</a>
"""
soup = BeautifulSoup(html, "html.parser")

for tag in soup.select("a.reference.internal"):
shallow = "".join(tag.find_all(text=True, recursive=False)).strip()
deep = tag.find("code").text.strip()
print(repr(shallow)) # => '— Parser for command line options'
print(repr(deep)) # => 'optparse'

Extract Parent Text without Children Text; Parsing HTML

You can use .contents and access the 0th element:

for tag in soup.find_all("p"):
print(tag.contents[0].strip())

Output:

Environment:
Basic Rules

Or with your attempt, you can remove the <span>'s using .extract() by:

for tag in soup.select("p span"):
tag.extract()

print(soup.prettify())

Output:

<footer>
<p class="tags environment-tags">
Environment:
</p>
<p class="source monster-source">
Basic Rules
</p>
</footer>

Text from BeautifulSoup4 missing

If you have a look at the .contents of a tag, you'll see that the text you want belongs to a class called NavigableString.

from bs4 import BeautifulSoup, NavigableString

html = """<a aria-expanded="false" aria-owns="faqGen5" href="#">aaa <span class="nobreak">bbb</span> ccc?</a>"""
soup = BeautifulSoup(html, 'lxml')

for content in soup.find('a').contents:
print(content, type(content))

# aaa <class 'bs4.element.NavigableString'>
# <span class="nobreak">bbb</span> <class 'bs4.element.Tag'>
# ccc? <class 'bs4.element.NavigableString'>

Now, you simply need to get the elements belonging to the NavigableString class and join them together.

text = ''.join([x for x in soup.find('a').contents if isinstance(x, NavigableString)])
print(text)
# aaa ccc?

PyQuery: Get only text of element, not text of child elements

I don't think there is an clean way to do that. At least I've found this solution:

>>> print doc('h1').html(doc('h1')('span').outerHtml())
<h1 class="price"><span class="strike">$325.00</span></h1>

You can use .text() instead of .outerHtml() if you don't want to keep the span tag.

Removing the first one is much more easy:

>>> print doc('h1').remove('span')
<h1 class="price">
$295.00
</h1>

jquery - get text for element without children text

hey try this please": http://jsfiddle.net/MtVxx/2/

Good link for your specific case in here: http://viralpatel.net/blogs/jquery-get-text-element-without-child-element/ (This will only get the text of the element)

Hope this helps, :)

code

jQuery.fn.justtext = function() {

return $(this).clone()
.children()
.remove()
.end()
.text();

};

alert($('#mydiv').justtext());​


Related Topics



Leave a reply



Submit