Constructing pandas DataFrame from values in variables gives ValueError: If using all scalar values, you must pass an index
The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns -- e.g. use a list:
>>> df = pd.DataFrame({'A': [a], 'B': [b]})
>>> df
A B
0 2 3
or use scalar values and pass an index:
>>> df = pd.DataFrame({'A': a, 'B': b}, index=[0])
>>> df
A B
0 2 3
Getting Error: ValueError: If using all scalar values, you must pass an index when converting ndarray to pandas Dataframe
All of your sub_iti, tw, col_iti
are 2D numpy arrays. However, when you do:
df=pd.DataFrame ({'sub_iti': sub_iti,
'tw': tw,
'col_iti': col_iti} )
Pandas expected them to be 1D
numpy arrays or lists, since that's how columns of a DataFrame should be. You can try:
df=pd.DataFrame ({'sub_iti': sub_iti.tolist(),
'tw': tw.tolist(),'col_iti': col_iti.tolist()})
Output:
sub_iti tw col_iti
0 [s1] [xx] [TA]
1 [s1] [xx] [BAT]
2 [s1] [xx] [T]
3 [s1] [cc] [TA]
4 [s1] [cc] [BAT]
5 [s1] [cc] [T]
But I do think that you should remove the lists inside each cell, and use ravel()
instead of tolist()
:
df=pd.DataFrame ({'sub_iti': sub_iti.ravel(),
'tw': tw.ravel(),'col_iti': col_iti.ravel()})
Output:
sub_iti tw col_iti
0 s1 xx TA
1 s1 xx BAT
2 s1 xx T
3 s1 cc TA
4 s1 cc BAT
5 s1 cc T
Dictionary to Dataframe Error: If using all scalar values, you must pass an index
This error occurs because pandas needs an index. At first this seems sort of confusing because you think of list indexing. What this is essentially asking for is a column number for each dictionary to correspond to each dictionary. You can set this like so:
import pandas as pd
list = ['a', 'b', 'c', 'd']
df = pd.DataFrame(list, index = [0, 1, 2, 3])
The data frame then yields:
0
0 'a'
1 'b'
2 'c'
3 'd'
For you specifically, this might look something like this using numpy (not tested):
list_of_dfs = {}
for I in range(0,len(regionLoadArray)):
list_of_dfs[I] = pd.read_csv(regionLoadArray[I])
ind = np.arange[len(list_of_dfs)]
dataframe = pd.DataFrame(list_of_dfs, index = ind)
Constructing pandas DataFrame from values in variables gives ValueError: If using all scalar values, you must pass an index
The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns -- e.g. use a list:
>>> df = pd.DataFrame({'A': [a], 'B': [b]})
>>> df
A B
0 2 3
or use scalar values and pass an index:
>>> df = pd.DataFrame({'A': a, 'B': b}, index=[0])
>>> df
A B
0 2 3
pandas read_json: If using all scalar values, you must pass an index
Try
ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')
That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count')
.
You can also do something like this:
import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
data = json.load(f)
Now data is a dictionary. You can pass it to a dataframe constructor like this:
df = pd.DataFrame({'count': data})
how to solve If using all scalar values, you must pass an index problem pandas
Try to convert the values of dictionary to list
if they are scalars:
from ast import literal_eval
vals = literal_eval(d[1].strip())
df = pd.DataFrame(
{k: v if isinstance(v, (list, tuple)) else [v] for k, v in vals.items()}
)
print(df)
ValueError: If using all scalar values, you must pass an index
The problem is that when you use the DataFrame
constructor:
pd.DataFrame({m: eurusd.interpolate(method=m) for m in methods})
the value for each m
is a DataFrame
, which will be interpreted as a scalar value, which is admittedly confusing. This constructer expects some sort of sequence or Series
. The following should solve the problem:
pd.DataFrame({m: eurusd['BID-CLOSE'].interpolate(method=m) for m in methods})
Since subsetting on a column returns a Series
. So, for example instead of:
In [34]: pd.DataFrame({'linear':df.interpolate('linear')})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-4b6c095c6da3> in <module>()
----> 1 pd.DataFrame({'linear':df.interpolate('linear')})
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
222 dtype=dtype, copy=copy)
223 elif isinstance(data, dict):
--> 224 mgr = self._init_dict(data, index, columns, dtype=dtype)
225 elif isinstance(data, ma.MaskedArray):
226 import numpy.ma.mrecords as mrecords
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
358 arrays = [data[k] for k in keys]
359
--> 360 return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
361
362 def _init_ndarray(self, values, index, columns, dtype=None, copy=False):
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5229 # figure out the index, if necessary
5230 if index is None:
-> 5231 index = extract_index(arrays)
5232 else:
5233 index = _ensure_index(index)
/home/juan/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in extract_index(data)
5268
5269 if not indexes and not raw_lengths:
-> 5270 raise ValueError('If using all scalar values, you must pass'
5271 ' an index')
5272
ValueError: If using all scalar values, you must pass an index
Use this instead:
In [35]: pd.DataFrame({'linear':df['BID-CLOSE'].interpolate('linear')})
Out[35]:
linear
timestamp
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.319710
2016-10-10 22:00:00 1.317210
2016-10-10 22:00:00 1.317710
2016-10-10 22:00:00 1.314110
2016-10-10 22:00:00 1.313010
2016-10-10 22:00:00 1.311910
2016-10-10 22:00:00 1.310810
2016-10-10 22:00:00 1.309710
2016-10-10 22:00:00 1.311310
2016-10-10 22:00:00 1.314910
2016-10-10 22:00:00 1.320210
2016-10-10 22:00:00 1.322577
2016-10-10 22:00:00 1.324943
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.327310
2016-10-10 22:00:00 1.317010
2016-10-10 22:00:00 1.308310
Fair warning, though, I am getting a LinAlgError: singular matrix
error when I try 'quadratic'
and 'cubic'
interpolation on your data. Not sure why though.
Related Topics
Matplotlib Plots: Removing Axis, Legends and White Spaces
Remove All Special Characters, Punctuation and Spaces from String
How to Get the Path of the Python Script I am Running In
Python's Most Efficient Way to Choose Longest String in List
Positional Argument V.S. Keyword Argument
How to Find Length of Digits in an Integer
How to Get a Value of Datetime.Today() in Python That Is "Timezone Aware"
How to Percent-Encode Url Parameters in Python
How to Append a New Row to an Old CSV File in Python
Pandas Dataframe: Replace Nan Values with Average of Columns
Disable Tensorflow Debugging Information
Sftp in Python? (Platform Independent)
Does Python Support Multithreading? Can It Speed Up Execution Time
Pandas Dataframe Get First Row of Each Group
Importerror: No Module Named Pil
Pandas Timeseries Plot Setting X-Axis Major and Minor Ticks and Labels