How to get rid of BeautifulSoup user warning?
The solution to your problem is clearly stated in the error message. Code like the below does not specify an XML/HTML/etc. parser.
BeautifulSoup( ... )
In order to fix the error, you'll need to specify which parser you'd like to use, like so:
BeautifulSoup( ..., "html.parser" )
You can also install a 3rd party parser if you'd like.
Beautiful soup module error(html parser)
You'll have to import BeautifulSoup
from bs4 package
import urllib2
import requests
from bs4 import BeautifulSoup #here
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get("https://www.sikayetvar.com/onedio", headers = headers)
soup = BeautifulSoup(response.text)
pages = soup.select('div.pagination a')
a = int(pages[-2].text)
print a
lxml / BeautifulSoup parser warning
I had to read lxml
's and BeautifulSoup's source code to figure this out.
I'm posting my own answer here, in case someone else may need it in the future.
The fromstring
function in question is defined so:
def fromstring(data, beautifulsoup=None, makeelement=None, **bsargs):
The **bsargs
arguments ends up being sent forward to the BeautifulSoup constructor, which is called like so (in another function, _parse
):
tree = beautifulsoup(source, **bsargs)
The BeautifulSoup constructor is defined so:
def __init__(self, markup="", features=None, builder=None,
parse_only=None, from_encoding=None, exclude_encodings=None,
**kwargs):
Now, back to the warning in the question, which is recommending that the argument "html.parser" be added to BeautifulSoup's contructor. According to this, that would be the argument named features
.
Since the fromstring
function will pass on named arguments to BeautifulSoup's constructor, we can specify the parser by naming the argument to the fromstring
function, like so:
root = fromstring(clean, features='html.parser')
Poof. The warning disappears.
Related Topics
Generating File to Download with Django
Time Complexity of Accessing a Python Dict
Enable Python to Connect to MySQL via Ssh Tunnelling
Pandas - Convert Strings to Time Without Date
Unicodeencodeerror: 'Latin-1' Codec Can't Encode Character
How to Convert a Python Datetime.Datetime to Excel Serial Date Number
Pandas Finding Local Max and Min
Read a Small Random Sample from a Big CSV File into a Python Data Frame
How to Create an Incrementing Filename in Python
How to Find Out Whether a File Is at Its 'Eof'
Pandas - Explanation on Apply Function Being Slow
How to Check If a String Only Contains Letters
Multiple Ping Script in Python
Python - How to Check List Monotonicity
Python Argparse - Add Argument to Multiple Subparsers