"Ssl: Certificate_Verify_Failed" Error When Scraping Https://Www.Thenewboston.Com/

SSL: certificate_verify_failed error when scraping https://www.thenewboston.com/

The problem is not in your code but in the web site you are trying to access. When looking at the analysis by SSLLabs you will note:

This server's certificate chain is incomplete. Grade capped to B.

This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some desktop browsers work around this configuration problem by trying to load the missing certificates from the internet or fill in with cached certificates. But other browsers or applications will fail too, similar to python.

To work around the broken server configuration you might explicitly extract the missing certificates and add them to you trust store. Or you might give the certificate as trust inside the verify argument. From the documentation:

You can pass verify the path to a CA_BUNDLE file or directory with
certificates of trusted CAs:

>>> requests.get('https://github.com', verify='/path/to/certfile') 

This list of trusted CAs can also be specified through the
REQUESTS_CA_BUNDLE environment variable.

Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

Once upon a time I stumbled with this issue. If you're using macOS go to Macintosh HD > Applications > Python3.6 folder (or whatever version of python you're using) > double click on "Install Certificates.command" file. :D

SSL: CERTIFICATE_VERIFY_FAILED request.get

If you're not worried about safety (which you should be) your best bet is to use verify=False in the request function.

page = requests.get(url, verify=False)

You can also set verify to a directory of certificates with trusted CAs like so

verify = '/path/to/certfile'

You can refer to the documentation here for all the ways to get around it

- [SSL: CERTIFICATE_VERIFY_FAILED] while working on BeautifulSoup4 on Linux

For solving this error you can use verify=False into requests.get(url, verify=False).

For example:

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.docenti.unina.it/#!/professor/47494f434f4e44414d4f5343415249454c4c4f4d5343474e4435344c36354634383143/avvisi",verify=False)
soup = BeautifulSoup(url.content, "html.parser") # Requesting the source code with bs4

print(url.status_code)

Result:

<Response [200]>

Python Urllib2 SSL error

To summarize the comments about the cause of the problem and explain the real problem in more detail:

If you check the trust chain for the OpenSSL client you get the following:

 [0] 54:7D:B3:AC:BF:... /CN=*.s3.amazonaws.com 
[1] 5D:EB:8F:33:9E:... /CN=VeriSign Class 3 Secure Server CA - G3
[2] F4:A8:0A:0C:D1:... /CN=VeriSign Class 3 Public Primary Certification Authority - G5
[OT] A1:DB:63:93:91:... /C=US/O=VeriSign, Inc./OU=Class 3 Public Primary Certification Authority

The first certificate [0] is the leaf certificate sent by the server. The following certifcates [1] and [2] are chain certificates sent by the server. The last certificate [OT] is the trusted root certificate, which is not sent by the server but is in the local storage of trusted CA. Each certificate in the chain is signed by the next one and the last certificate [OT] is trusted, so the trust chain is complete.

If you check the trust chain instead by a browser (e.g. Google Chrome using the NSS library) you get the following chain:

 [0] 54:7D:B3:AC:BF:... /CN=*.s3.amazonaws.com 
[1] 5D:EB:8F:33:9E:... /CN=VeriSign Class 3 Secure Server CA - G3
[NT] 4E:B6:D5:78:49:... /CN=VeriSign Class 3 Public Primary Certification Authority - G5

Here [0] and [1] are again sent by the server, but [NT] is the trusted root certificate. While this looks from the subject exactly like the chain certificate [2] the fingerprint says that the certificates are different. If you would take a closer looks at the certificates [2] and [NT] you would see, that the public key inside the certificate is the same and thus both [2] and [NT] can be used to verify the signature for [1] and thus can be used to build the trust chain.

This means, that while the server sends the same certificate chain in all cases there are multiple ways to verify the chain up to a trusted root certificate. How this is done depends on the SSL library and on the known trusted root certificates:

                          [0] (*.s3.amazonaws.com)
|
[1] (Verisign G3) --------------------------\
| |
/------------------ [2] (Verisign G5 F4:A8:0A:0C:D1...) |
| |
| certificates sent by server |
.....|...............................................................|................
| locally trusted root certificates |
| |
[OT] Public Primary Certification Authority [NT] Verisign G5 4E:B6:D5:78:49
OpenSSL library Google Chrome (NSS library)

But the question remains, why your verification was unsuccessful.
What you did was to take the trusted root certificate used by the browser (Verisign G5 4E:B6:D5:78:49) together with OpenSSL. But the verification in browser (NSS) and OpenSSL work slightly different:

  • NSS: build trust chain from certificates send by the server. Stop building the chain when we got a certificate signed by any of the locally trusted root certificates.
  • OpenSSL_ build trust chain from the certificates sent by the server. After this is done check if we have a trusted root certificate signing the latest certificate in the chain.

Because of this subtle difference OpenSSL is not able to verify the chain [0],[1],[2] against root certificate [NT], because this certificate does not sign the latest element in chain [2] but instead [1]. If the server would instead only sent a chain of [0],[1] then the verification would succeed.

This is a long known bug and there exist patches and hopefully the issue if finally addressed in OpenSSL 1.0.2 with the introduction of the X509_V_FLAG_TRUSTED_FIRST option.



Related Topics



Leave a reply



Submit