SSL: certificate_verify_failed error when scraping https://www.thenewboston.com/
The problem is not in your code but in the web site you are trying to access. When looking at the analysis by SSLLabs you will note:
This server's certificate chain is incomplete. Grade capped to B.
This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some desktop browsers work around this configuration problem by trying to load the missing certificates from the internet or fill in with cached certificates. But other browsers or applications will fail too, similar to python.
To work around the broken server configuration you might explicitly extract the missing certificates and add them to you trust store. Or you might give the certificate as trust inside the verify argument. From the documentation:
You can pass verify the path to a CA_BUNDLE file or directory with
certificates of trusted CAs:>>> requests.get('https://github.com', verify='/path/to/certfile')
This list of trusted CAs can also be specified through the
REQUESTS_CA_BUNDLE environment variable.
Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org
Once upon a time I stumbled with this issue. If you're using macOS go to Macintosh HD > Applications > Python3.6 folder (or whatever version of python you're using) > double click on "Install Certificates.command" file. :D
SSL: CERTIFICATE_VERIFY_FAILED request.get
If you're not worried about safety (which you should be) your best bet is to use verify=False
in the request function.
page = requests.get(url, verify=False)
You can also set verify
to a directory of certificates with trusted CAs like so
verify = '/path/to/certfile'
You can refer to the documentation here for all the ways to get around it
- [SSL: CERTIFICATE_VERIFY_FAILED] while working on BeautifulSoup4 on Linux
For solving this error you can use verify=False
into requests.get(url, verify=False)
.
For example:
from bs4 import BeautifulSoup
import requests
url = requests.get("https://www.docenti.unina.it/#!/professor/47494f434f4e44414d4f5343415249454c4c4f4d5343474e4435344c36354634383143/avvisi",verify=False)
soup = BeautifulSoup(url.content, "html.parser") # Requesting the source code with bs4
print(url.status_code)
Result:
<Response [200]>
Python Urllib2 SSL error
To summarize the comments about the cause of the problem and explain the real problem in more detail:
If you check the trust chain for the OpenSSL client you get the following:
[0] 54:7D:B3:AC:BF:... /CN=*.s3.amazonaws.com
[1] 5D:EB:8F:33:9E:... /CN=VeriSign Class 3 Secure Server CA - G3
[2] F4:A8:0A:0C:D1:... /CN=VeriSign Class 3 Public Primary Certification Authority - G5
[OT] A1:DB:63:93:91:... /C=US/O=VeriSign, Inc./OU=Class 3 Public Primary Certification Authority
The first certificate [0] is the leaf certificate sent by the server. The following certifcates [1] and [2] are chain certificates sent by the server. The last certificate [OT] is the trusted root certificate, which is not sent by the server but is in the local storage of trusted CA. Each certificate in the chain is signed by the next one and the last certificate [OT] is trusted, so the trust chain is complete.
If you check the trust chain instead by a browser (e.g. Google Chrome using the NSS library) you get the following chain:
[0] 54:7D:B3:AC:BF:... /CN=*.s3.amazonaws.com
[1] 5D:EB:8F:33:9E:... /CN=VeriSign Class 3 Secure Server CA - G3
[NT] 4E:B6:D5:78:49:... /CN=VeriSign Class 3 Public Primary Certification Authority - G5
Here [0] and [1] are again sent by the server, but [NT] is the trusted root certificate. While this looks from the subject exactly like the chain certificate [2] the fingerprint says that the certificates are different. If you would take a closer looks at the certificates [2] and [NT] you would see, that the public key inside the certificate is the same and thus both [2] and [NT] can be used to verify the signature for [1] and thus can be used to build the trust chain.
This means, that while the server sends the same certificate chain in all cases there are multiple ways to verify the chain up to a trusted root certificate. How this is done depends on the SSL library and on the known trusted root certificates:
[0] (*.s3.amazonaws.com)
|
[1] (Verisign G3) --------------------------\
| |
/------------------ [2] (Verisign G5 F4:A8:0A:0C:D1...) |
| |
| certificates sent by server |
.....|...............................................................|................
| locally trusted root certificates |
| |
[OT] Public Primary Certification Authority [NT] Verisign G5 4E:B6:D5:78:49
OpenSSL library Google Chrome (NSS library)
But the question remains, why your verification was unsuccessful.
What you did was to take the trusted root certificate used by the browser (Verisign G5 4E:B6:D5:78:49) together with OpenSSL. But the verification in browser (NSS) and OpenSSL work slightly different:
- NSS: build trust chain from certificates send by the server. Stop building the chain when we got a certificate signed by any of the locally trusted root certificates.
- OpenSSL_ build trust chain from the certificates sent by the server. After this is done check if we have a trusted root certificate signing the latest certificate in the chain.
Because of this subtle difference OpenSSL is not able to verify the chain [0],[1],[2] against root certificate [NT], because this certificate does not sign the latest element in chain [2] but instead [1]. If the server would instead only sent a chain of [0],[1] then the verification would succeed.
This is a long known bug and there exist patches and hopefully the issue if finally addressed in OpenSSL 1.0.2 with the introduction of the X509_V_FLAG_TRUSTED_FIRST
option.
Related Topics
Installing Module from Github Through Jupyter Notebook
Python Worker Failed to Connect Back
Why Should I Close Files in Python
Use Aws Glue Python with Numpy and Pandas Python Packages
How to Open Multiple Webpages in Separate Tabs Within a Browser Using Selenium-Webdriver and Python
Fix Not Load Dynamic Library for Tensorflow Gpu
Convert a 1D Array to a 2D Array in Numpy
Iso to Datetime Object: 'Z' Is a Bad Directive
Python: My Function Returns "None" After It Does What I Want It To
How to Rotate a Matplotlib Plot Through 90 Degrees
For Loops and Iterating Through Lists
String Comparison Doesn't Seem to Work for Lines Read from a File
Weird Behavior: Lambda Inside List Comprehension
Resampling a Numpy Array Representing an Image
Region: Ioerror: [Errno 22] Invalid Mode ('W') or Filename