Jsondecodeerror: Expecting Value: Line 1 Column 1 (Char 0)

JSONDecodeError: Expecting value: line 1 column 1 (char 0) when scaping SEC EDGAR

Apparently the SEC has added rate-limiting to their website, according to this GitHub issue from May 2021. The reason why you're receiving the error message is that the response contains HTML, rather than JSON, which causes requests to raise an error upon calling .json().

To resolve this, you need to add the User-agent header to your request. I can access the JSON with the following:

import requests
import urllib
from bs4 import BeautifulSoup

year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url, headers={'User-agent': '[specify user agent here]'})
decoded_year_url = year_content.json()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Your code produced an empty response body, you'd want to check for that or catch the exception raised. It is possible the server responded with a 204 No Content response, or a non-200-range status code was returned (404 Not Found, etc.). Check for this.

Note:

  • There is no need to use simplejson library, the same library is included with Python as the json module.

  • There is no need to decode a response from UTF8 to unicode, the simplejson / json .loads() method can handle UTF8 encoded data natively.

  • pycurl has a very archaic API. Unless you have a specific requirement for using it, there are better choices.

Either the requests or httpx offers much friendlier APIs, including JSON support. If you can, replace your call with:

import requests

response = requests.get(url)
response.raise_for_status() # raises exception when not a 2xx response
if response.status_code != 204:
return response.json()

Of course, this won't protect you from a URL that doesn't comply with HTTP standards; when using arbirary URLs where this is a possibility, check if the server intended to give you JSON by checking the Content-Type header, and for good measure catch the exception:

if (
response.status_code != 204 and
response.headers["content-type"].strip().startswith("application/json")
):
try:
return response.json()
except ValueError:
# decide how to handle a server that's misbehaving to this extent

json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)

The short answer is that this is just how the JSON spec works. All strings must be double quoted as you can see from the diagram in the middle of Introducing JSON.

However, you can parse that string in python using ast.literal_eval() rather than json.loads().

import ast
print(ast.literal_eval("['product', 'font', 'graphics', 'photo caption', 'brand', 'advertising', 'technology', 'text', 'graphic design', 'competition']"))


Related Topics



Leave a reply



Submit