How to reversibly store and load a Pandas dataframe to/from disk
The easiest way is to pickle it using to_pickle
:
df.to_pickle(file_name) # where to save it, usually as a .pkl
Then you can load it back using:
df = pd.read_pickle(file_name)
Note: before 0.11.1 save
and load
were the only way to do this (they are now deprecated in favor of to_pickle
and read_pickle
respectively).
Another popular choice is to use HDF5 (pytables) which offers very fast access times for large datasets:
import pandas as pd
store = pd.HDFStore('store.h5')
store['df'] = df # save it
store['df'] # load it
More advanced strategies are discussed in the cookbook.
Since 0.13 there's also msgpack which may be be better for interoperability, as a faster alternative to JSON, or if you have python object/text-heavy data (see this question).
Is there a way to store output dataframes and appending them to the last output in the same dataframe
The issue is resolved by first creating an empty data frame and then appending the outputs in the dataframe within the loop.
The updated code is as follows:
column_names = ["parcel_foreign_id_x", "s1product_end_time", "s1product_ron","cohvh_avg", "cohvv_avg", "vhvv_avg","parcel_foreign_id_y", "s2product_start_time", "s2product_ron", "ndvi_avg" ]
df = pd.DataFrame(columns = column_names)
foreign=1
while (foreign <=50):
s1_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s1?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs1product_end_time%2Cs1product_ron%2Ccohvh_avg%2Ccohvv_avg%2Cvhvv_avg'
s2_time_series_url_p6 = 'https://demodev2.kappazeta.ee/ard_api_demo/v1/time_series/s2?limit_to_rasters=true&parcel_foreign_id=0&properties=parcel_foreign_id%2Cs2product_start_time%2Cs2product_ron%2Cndvi_avg'
position = 101
foreign_n=str(foreign)
s1_time_series_url_p6 = s1_time_series_url_p6[:position] + foreign_n + s1_time_series_url_p6[position+1:]
s2_time_series_url_p6 = s2_time_series_url_p6[:position] + foreign_n + s2_time_series_url_p6[position+1:]
r_s1_time_series_p6 = requests.get(s1_time_series_url_p6)
r_s2_time_series_p6 = requests.get(s2_time_series_url_p6)
json_s1_time_series_p6 = r_s1_time_series_p6.json()
json_s2_time_series_p6 = r_s2_time_series_p6.json()
df_s1_time_series_p6 = pd.DataFrame(json_s1_time_series_p6['s1_time_series'])
df_s2_time_series_p6 = pd.DataFrame(json_s2_time_series_p6['s2_time_series'])
df_s2_time_series_p6.s2product_start_time=df_s2_time_series_p6.s2product_start_time.str[0:11]
df_s1_time_series_p6.s1product_end_time=df_s1_time_series_p6.s1product_end_time.str[0:11]
dfinal_p6 = df_s1_time_series_p6.merge(df_s2_time_series_p6, how='inner', left_on='s1product_end_time', right_on='s2product_start_time')
cols_p6 = ['parcel_foreign_id_x', 's1product_ron','parcel_foreign_id_y','s2product_ron']
dfinal_p6[cols_p6] = dfinal_p6[cols_p6].apply(pd.to_numeric, errors='coerce', axis=1)
df = pd.concat([dfinal_p6,df],ignore_index = True)
foreign = foreign+1
How to store `pandas.DataFrame` in a PANDAS-LOADABLE binary format other than `pickle`
I would guess that your data frame is too big. Pickle has some limits. You are much better off either saving in a database or using to_hdf (or lots of other IO routines, to_msgpack might works as well).
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html
Storing the results from a function into a retrievable DataFrame in Python
I'm not familiar enough with computer science terminology to thoroughly explain this to you, but basically, when you call a function that has a return value, that value must be saved as a variable.
df only exists in your function. (I think that's called scope). When you leave the function, df is gone
You're doing
get_intraday_data(symbol, 301,10)
So, after that function is run, the returned variable is gone
instead, do the following:
df = get_intraday_data(symbol, 301,10)
then you can do stuff with it
Alternatively, instead of returning the df, you can pickle it. In your "get_intraday_symbol"
fname = 'file1.P'
df.to_pickle(fname)
return fname
Then, subsequent code has to read the pickled dataframe
fname = get_intraday_data(symbol, 301,10)
df = pd.read_pickle(fname)
Related Topics
List Comprehension in Haskell, Python and Ruby
How to Redirect Stdout to Both File and Console with Scripting
How to Validate a Date String Format in Python
How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System
How to Stop a Looping Thread in Python
Python 3.7 Anaconda Environment - Import _Ssl Dll Load Fail Error
Rally APIs: How to Copy Test Folder and Member Test Cases
Python VS. Ruby for Metaprogramming
Which of These Scripting Languages Is More Appropriate for Pen-Testing
Getting List of Lists into Pandas Dataframe
Vscode -- How to Set Working Directory for Debugging a Python Program
Fast Way of Counting Non-Zero Bits in Positive Integer
Understanding Python's Call-By-Object Style of Passing Function Arguments