Bs4.Featurenotfound: Couldn't Find a Tree Builder with the Features You Requested: Lxml. Do You Need to Install a Parser Library

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?

You need to mention the tag, so instead of soup.find(id="wob_wc"), it's should be soup.find("div", id="wob_wc"))

And the parser name is html.parser not html-parser the difference is the dot.

Also by default, Google will give you usually a response of 200 to prevent you from getting to know if you blocked or not. usually you've to check r.content.

I've included the headers and now it's works.

import requests
from bs4 import BeautifulSoup

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
"https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

print(soup.find("div", id="wob_wc"))

Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?

If you are using html5lib as an underlying parser:

soup = BeautifulSoup(html, "html5lib")
# ^HERE^

Then, you need to have html5lib module installed in your python environment:

pip install html5lib

Documentation reference: Installing a parser.

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml

Open a python shell and try the following:

from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")

Does that work, or same error? If same error, you're missing lxml. Install it:

pip install lxml

I'm going through the steps because you indicate that the script works for a good while before crashing, in which case, you can't be missing the parser?

Added by OP:

If you are using Python2.7 in Ubuntu/Debian, this worked for me:

$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml

Test it like:

mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml

Already got html-parser for BS4 : Couldn't find a tree builder ... html-parser

You cannot do it. When do import between its own modules/packages Beautifulsoup4 uses absolute import. For example:

https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/604/bs4/builder/__init__.py#L7

from bs4.element import …

I.e., bs4 is required to be a top-level package, it cannot be a subpackage. To make it a subpackage you should rewrite the whole source code and make all import relative. Similar to this pull-request to my project Cheetah3 sent with the similar reason: "This makes it possible to embed Cheetah in other packages…"

FeatureNotFound: Couldn't find a tree builder with the features you requested – Webscraping with Pandas

If lxml does not exist, you can install it using

pip install lxml

You could also use a different parser to the same effect. html.parser and html5lib are both available by default.

soup = BeautifulSoup(res.content,'html.parser')

This should solve the issue of scraping the webpage. Once you've scraped it, I think you'll need to load table[3], for the table of player stats.



Related Topics



Leave a reply



Submit