bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?
You need to mention the tag, so instead of soup.find(id="wob_wc")
, it's should be soup.find("div", id="wob_wc"))
And the parser name is html.parser
not html-parser
the difference is the dot.
Also by default, Google
will give you usually a response of 200
to prevent you from getting to know if you blocked or not. usually you've to check r.content
.
I've included the headers
and now it's works.
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
"https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.find("div", id="wob_wc"))
Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
If you are using html5lib
as an underlying parser:
soup = BeautifulSoup(html, "html5lib")
# ^HERE^
Then, you need to have html5lib
module installed in your python environment:
pip install html5lib
Documentation reference: Installing a parser.
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml
Open a python shell and try the following:
from bs4 import BeautifulSoup
myHTML = "<html><head></heda><body><strong>Hi</strong></body></html>"
soup = BeautifulSoup(myHTML, "lxml")
Does that work, or same error? If same error, you're missing lxml. Install it:
pip install lxml
I'm going through the steps because you indicate that the script works for a good while before crashing, in which case, you can't be missing the parser?
Added by OP:
If you are using Python2.7 in Ubuntu/Debian, this worked for me:
$ sudo apt-get build-dep python-lxml
$ sudo pip install lxml
Test it like:
mona@pascal:~/computer_vision/image_retrieval$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
Already got html-parser for BS4 : Couldn't find a tree builder ... html-parser
You cannot do it. When do import between its own modules/packages Beautifulsoup4 uses absolute import. For example:
https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/604/bs4/builder/__init__.py#L7
from bs4.element import …
I.e., bs4
is required to be a top-level package, it cannot be a subpackage. To make it a subpackage you should rewrite the whole source code and make all import relative. Similar to this pull-request to my project Cheetah3 sent with the similar reason: "This makes it possible to embed Cheetah in other packages…"
FeatureNotFound: Couldn't find a tree builder with the features you requested – Webscraping with Pandas
If lxml
does not exist, you can install it using
pip install lxml
You could also use a different parser to the same effect. html.parser
and html5lib
are both available by default.
soup = BeautifulSoup(res.content,'html.parser')
This should solve the issue of scraping the webpage. Once you've scraped it, I think you'll need to load table[3]
, for the table of player stats.
Related Topics
Using Cprofile Results with Kcachegrind
Compare Two CSV Files and Search for Similar Items
Pandas Finding Local Max and Min
How to Access the Real Value of a Cell Using the Openpyxl Module for Python
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
How to Break Up This Long Line in Python
Python Overwriting Variables in Nested Functions
How to Simulate Jumping in Pygame for This Particular Code
Operation on Every Pair of Element in a List
Working with Tiffs (Import, Export) in Python Using Numpy
Error Running Basic Tensorflow Example
How to Add Conda Environment to Jupyter Lab
Rreplace - How to Replace the Last Occurrence of an Expression in a String
Django Rest Framework Serializing Many to Many Field
Python SQLite Parameter Substitution with Wildcards in Like
Python: Changing Methods and Attributes at Runtime