JSONDecodeError: Expecting value: line 1 column 1 (char 0) when scaping SEC EDGAR
Apparently the SEC has added rate-limiting to their website, according to this GitHub issue from May 2021. The reason why you're receiving the error message is that the response contains HTML, rather than JSON, which causes requests
to raise an error upon calling .json()
.
To resolve this, you need to add the User-agent
header to your request. I can access the JSON with the following:
import requests
import urllib
from bs4 import BeautifulSoup
year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json"
year_content = requests.get(year_url, headers={'User-agent': '[specify user agent here]'})
decoded_year_url = year_content.json()
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Your code produced an empty response body, you'd want to check for that or catch the exception raised. It is possible the server responded with a 204 No Content response, or a non-200-range status code was returned (404 Not Found, etc.). Check for this.
Note:
There is no need to use
simplejson
library, the same library is included with Python as thejson
module.There is no need to decode a response from UTF8 to unicode, the
simplejson
/json
.loads()
method can handle UTF8 encoded data natively.pycurl
has a very archaic API. Unless you have a specific requirement for using it, there are better choices.
Either the requests
or httpx
offers much friendlier APIs, including JSON support. If you can, replace your call with:
import requests
response = requests.get(url)
response.raise_for_status() # raises exception when not a 2xx response
if response.status_code != 204:
return response.json()
Of course, this won't protect you from a URL that doesn't comply with HTTP standards; when using arbirary URLs where this is a possibility, check if the server intended to give you JSON by checking the Content-Type header, and for good measure catch the exception:
if (
response.status_code != 204 and
response.headers["content-type"].strip().startswith("application/json")
):
try:
return response.json()
except ValueError:
# decide how to handle a server that's misbehaving to this extent
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
The short answer is that this is just how the JSON spec works. All strings must be double quoted as you can see from the diagram in the middle of Introducing JSON.
However, you can parse that string in python using ast.literal_eval()
rather than json.loads()
.
import ast
print(ast.literal_eval("['product', 'font', 'graphics', 'photo caption', 'brand', 'advertising', 'technology', 'text', 'graphic design', 'competition']"))
Related Topics
Access Is Denied When Trying to Pip Install a Package on Windows
Python Dataframe Query With Spaces in Column Name
Image.Open() Cannot Identify Image File - Python
Pandas Convert Columns to Percentages of the Totals
Plot Line Graph from Pandas Dataframe (With Multiple Lines)
Keeping High Scores in a Text File
Convert Pandas Dataframe to Numpy Array
Subtract a Value from Every Number in a List in Python
How to Extract Text from an Existing Docx File Using Python-Docx
Pandas: Calculate Total Percent Difference Between Two Data Frames
Python Converting MySQL Query Result to Json
How to Append Two Bytes in Python
How to Do This Horizontally Instead of Vertically in Python
How to Find the Closest Values in a Pandas Series to an Input Number