Load a JSON File from a URL request
url = "example.url"
response = requests.request("GET", url, headers=headers)
data = response.json()
python3: Read json file from url
You were close:
import requests
import json
response = json.loads(requests.get("your_url").text)
How can i read JSON data from online file?(python)
With requests
library:
import requests
f = "https://api.npoint.io/7872500d7eef44a03194"
data = requests.get(f).json()
data
Output:
{'sample': 'this is only a sample'}
Python 3 Get and parse JSON API
Version 1: (do a pip install requests
before running the script)
import requests
r = requests.get(url='https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
print(r.json())
Version 2: (do a pip install wget
before running the script)
import wget
fs = wget.download(url='https://hacker-news.firebaseio.com/v0/topstories.json?print=pretty')
with open(fs, 'r') as f:
content = f.read()
print(content)
Parse JSON from within HTML webpage Using Python
Example how to parse the Json data contained within this page:
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.bizbuysell.com/connecticut-businesses-for-sale/?q=bHQ9MzAsNDAsODA%3D"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = soup.select_one('[data-stype="searchResultsPage"]').contents[0]
data = json.loads(data)
# pretty print the data
print(json.dumps(data, indent=4))
Prints:
{
"@context": "http://schema.org",
"@type": "SearchResultsPage",
"speakable": {
"@type": "SpeakableSpecification",
"xpath": [
"/html/head/title",
"/html/head/meta[@name='description']/@content"
]
},
"about": [
{
"item": {
"@type": "Product",
"name": "Moving Company",
"alternateName": null,
"logo": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
"image": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
"description": "The company is made up of three department. A licensed Household Goods Relocation and Eviction, an Insurance Agency, and also a Thrift Store. The reason why the company is established this way is the three departments work very well together. Most times someone calls us for services and require special Insurance Coverages. We represent several Insurance Companies and Wholesalers which us a great advantage to obtain the required Insurance Coverage without delays. Most time we relocate clients who are downsizing, children are grown, and moved out, and therefore do not have need for lots of furniture which we either purchase at minimal cost or given to us for free. It's a win win situation for the company. The items are sold very fast because the selling price is extremely low and the profit margin is very high.",
"url": "/Business-Opportunity/moving-company/1791243",
"productId": "1791243",
"offers": {
"@type": "Offer",
"price": 450000,
"priceCurrency": "USD",
"availability": "http://schema.org/InStock",
"url": "/Business-Opportunity/moving-company/1791243",
"image": "https://images.bizbuysell.com/shared/listings/179/1791243/ade90fd4-5537-4545-9011-58eb2f257a99-W496.jpg",
"availableAtOrFrom": {
"@type": "Place",
"address": {
"@type": "PostalAddress",
"addressLocality": "Hartford County",
"addressRegion": " CT"
}
}
}
},
"@type": "ListItem",
"position": 0
},
...
How can to get the JSON out of webpage?
First, you want to access the raw file, and not the UI. Just like Kache mentioned, you can get the JSON using:
resp = requests.get('https://chromium.googlesource.com/chromium/src/+/main/components/certificate_transparency/data/log_list.json?format=TEXT')
obj = json.loads(base64.decodebytes(resp.text.encode()))
Then, you can use the following script to extract only the data you want:
import requests
import json
import base64
def extract_log(log):
keys = [ 'description', 'log_id' ]
return { key: log[key] for key in keys }
def extract_logs(logs):
return [ extract_log(log) for log in logs ]
def extract_operator(operator):
return {
'name': operator['name'],
'logs': extract_logs(operator['logs'])
}
def extract_certificates(obj):
return [ extract_operator(operator) for operator in obj['operators'] ]
def scrape_certificates(url):
resp = requests.get(url)
obj = json.loads(base64.decodebytes(resp.text.encode()))
return extract_certificates(obj)
def main():
out = scrape_certificates('https://chromium.googlesource.com/chromium/src/+/main/components/certificate_transparency/data/log_list.json?format=TEXT')
print(json.dumps(out, indent=4))
if __name__ == '__main__':
main()
Trying to read JSON from URL and parse into CSV format
You are close, here's what you need to change:
- You can use pandas dataframes to read json using
df = pd.read_json(text, lines=True)
- for this make sure to specifylines=True
because some of your data contains\n
characters - You can use the same dataframe to output to a csv using
df.to_csv(file)
All in all, there are some things in your code that could be removed, e.g. you're calling requests.get
twice for no real reason, which slows your code down substantially.
import requests
import pandas as pd
all_links = ['https://www.baptisthealthsystem.com/docs/global/standard-charges/474131755_abrazomaranahospital_standardcharges.json?sfvrsn=9a27928_2',
'https://www.baptisthealthsystem.com/docs/global/standard-charges/621861138_abrazocavecreekhospital_standardcharges.json?sfvrsn=674fd6f_2',
'https://www.baptisthealthsystem.com/docs/global/standard-charges/621809851_abrazomesahospital_standardcharges.json?sfvrsn=13953222_2',
'https://www.baptisthealthsystem.com/docs/global/standard-charges/621811285_abrazosurprisehospital_standardcharges.json?sfvrsn=c8113dcf_2']
for item in all_links:
try:
length = len(item)
first_under = item.find('_') + 1
last_under = item.rfind('?') - 21
file_name = item[first_under:last_under]
r = requests.get(item)
df = pd.read_json(r.text, lines=True)
DOWNLOAD_PATH = 'C:\\Users\\ryans\\Desktop\\hospital_data\\' + file_name + '.csv'
with open(DOWNLOAD_PATH,'wb') as f:
df.to_csv(f)
except Exception as e: print(e)
Related Topics
Using Multipartposthandler to Post Form-Data with Python
How to Get Flask to Run on Port 80
Getting Started with the Python Debugger, Pdb
How to Separate the Functions of a Class into Multiple Files
Python - Using Pandas Structures with Large CSV(Iterate and Chunksize)
Is It Still Necessary to Install Cuda Before Using the Conda Tensorflow-Gpu Package
How to Use Brew Installed Python as the Default Python
Python's JSON Module, Converts Int Dictionary Keys to Strings
Catching an Exception While Using a Python 'With' Statement
Function Name Is Undefined in Python Class
Check List of Words in Another String
Hiding a Password in a Python Script (Insecure Obfuscation Only)
How to Prevent Python's Urllib(2) from Following a Redirect
Implement Matlab's Im2Col 'Sliding' in Python
Syntaxerror Inconsistency in Python
Pandas Fill Missing Values in Dataframe from Another Dataframe
Difference Between "Findall" and "Find_All" in Beautifulsoup