Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?
You need to specify data
, index
and columns
to DataFrame
constructor, as in:
>>> pd.DataFrame(data=data[1:,1:], # values
... index=data[1:,0], # 1st column as index
... columns=data[0,1:]) # 1st row as the column names
edit: as in the @joris comment, you may need to change above to np.int_(data[1:,1:])
to have correct data type.
How to make an index column in NumPy array?
So if I understood your question right then you have to add acolumn to your (presumably) 1D array.
import numpy as np
array = np.random.randint(0, 100,size=100) # random numpy array (1D)
index = np.arange(array.shape[0]) # create index array for indexing
array_with_indices = np.c_[array, index]
array_with indices[:, 1] // 10 + 1 # taking second column as it contains the indices
# or we can convert it to a dataframe if you prefer
df = pd.DataFrame(array, index = index)
# then it should work perfectly
df.index//10 + 1
Then you can insert it to df1.
Creating dataframe with multi level column index from from four 2d numpy arrays
One option is to reshape the data in Fortran order, before creating the dataframe:
# reusing your code
level_1_label = ['location1','location2','location3']
level_2_label = ['x1','x2','x3','x4']
header = pd.MultiIndex.from_product([level_1_label, level_2_label], names=['Location','Variable'])
# np.vstack is just a convenience wrapper around np.concatenate, axis=1
outcome = np.reshape(np.vstack([x1,x2,x3,x4]), (len(x1), -1), order = 'F')
df = pd.DataFrame(outcome, columns = header)
df.index.name = 'Time'
df
Location location1 location2 location3
Variable x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
Time
0 2 1 4 3 4 2 3 1 1 2 2 1
1 2 4 4 3 2 1 3 4 1 4 2 3
2 1 1 4 2 3 4 3 2 3 4 3 1
3 2 3 1 2 2 3 2 1 1 2 2 1
4 3 2 1 1 3 2 4 2 2 4 3 4
Building a DataFrame with column names in Python
You're making the input into DataFrame
as a list containing one element or a list in one dimension. You should be passing the actual array. Therefore, remove the brackets surrounding dat
:
In [9]: dat = pd.DataFrame(dat, columns = ["Var %d" % (i + 1) for i in range(10)])
In [10]: dat
Out[10]:
Var 1 Var 2 Var 3 Var 4 \
0 0.388888888889 0.388888888889 0.388888888889 0.436943311457
1 0.388888888889 0.388888888889 0.222222222222 0.445720017848
2 0.277777777778 0.277777777778 0.0555555555556 0.442623129181
3 0.111111111111 0.111111111111 0.166666666667 0.465180784545
4 0.5 0.5 0.333333333333 0.445720017848
5 0.388888888889 0.388888888889 0.222222222222 0.449433221856
6 0.388888888889 0.388888888889 0.333333333333 0.442491458743
7 0.333333333333 0.0555555555556 0.777777777778 0.438941511384
8 0.444444444444 0.444444444444 0.444444444444 0.427707051887
9 0.222222222222 0.277777777778 0.5 0.431823227653
Var 5 Var 6 Var 7 Var 8 \
0 0.790590003119 0.502046809222 0.838971773428 0.76049230908
1 0.811477946525 0.506899600792 0.836856648557 0.760617288779
2 0.788341322621 0.503717213312 0.837036254923 0.759975270403
3 0.798337900365 0.525060453789 0.846387521536 0.753358230843
4 0.787804059391 0.506899600792 0.836856648557 0.760501605832
5 0.784362288852 0.505575764415 0.83512539411 0.760417126777
6 0.787743031271 0.502995011027 0.836692391333 0.760611529526
7 0.787804059391 0.506899600792 0.836856648557 0.760501605832
8 0.79760395106 0.505723065708 0.836856648557 0.760501605832
9 0.797173287335 0.507239045809 0.845413649425 0.761341659888
Var 9 Var 10
0 0.820605442278 0
1 0.819548947891 1
2 0.81842187229 2
3 0.824154832595 3
4 0.819548947891 4
5 0.818544294533 5
6 0.819815007518 6
7 0.819548947891 7
8 0.819548947891 8
9 0.823903785101 9
Don't mind the list comprehension for the columns
field. I just didn't want to type out all of those Var
s :).
Pandas DataFrame from Numpy Array - column order
If your data is already in a dataframe, it's much easier to just pass the values of the Pitch
column to savgol_filter
:
data_arr_smooth = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
data_fr = pd.DataFrame({'time': data.time.values,'angle': data_arr_smooth})
There's no need to explicitly convert your data to float as long as they are numeric, savgol_filter
will do this for you:
If x is not a single or double precision floating point array, it
will be converted to type numpy.float64 before filtering.
If you want both original and smoothed data in you original dataframe then just assign a new column to it:
data['angle'] = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
convert numpy array into dataframe
My favorite way to transform numpy arrays to pandas DataFrames is to pass the columns in a dictionary:
df = pd.DataFrame({'col1':nparray[0], 'col2':nparray[1]})
However, if you have many columns, you can try:
# Create list of column names with the format "colN" (from 1 to N)
col_names = ['col' + str(i) for i in np.arange(nparray.shape[0]) + 1]
# Declare pandas.DataFrame object
df = pd.DataFrame(data=nparray.T, columns=col_names)
In the second solution, you have to restructure your array before passing it to data = ...
. That is, you have to rearrange nparray
so that is has rows and columns. Numpy has a method for that: you simply add .T
to your array: nparray.T
.
Create Pandas dataframe from numpy array and use first column of the array as index
You passed the complete array as the data
param, you need to slice your array also if you want just 4 columns from the array as the data:
In [158]:
df = pd.DataFrame(a[:,1:], index=a[:,0], columns=['A', 'B','C','D'])
df
Out[158]:
A B C D
1 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
2 4.6 3.1 1.5 0.2
Also having duplicate values in the index will make filtering/indexing problematic
So here a[:,1:]
I take all the rows but index from column 1 onwards as desired, see the docs
Creating a pandas dataframe from a 2d numpy array (to be a column of 1d numpy arrays) and a 1d np array of labels
You should always try to normalize your data such that each column only contains singular values, not data with a dimension.
In this case, I would do something like this:
>>> df = pd.DataFrame({'x': points[:,0], 'y': points[:, 1], 'label': labels},
columns=['x', 'y', 'label'])
>>> df
x y label
0 1 2 0
1 2 1 1
2 100 100 1
3 -2 -1 1
4 0 0 0
5 -1 -2 0
If you truly insist with keeping points as such, transform them to a list of lists or list of tuples before passing to pandas
to avoid this error.
How to convert a pandas dataframe into a numpy array with the column names
- do a quick search for a val by their "item" and "color" with one of the following options:
- Use pandas Boolean indexing
- Convert the dataframe into a
numpy.recarry
usingpandas.DataFrame.to_records
, and also use Boolean indexing
.item
is a method for bothpandas
andnumpy
, so don't use'item'
as a column name. It has been changed to'_item'
.- As an FYI,
numpy
is apandas
dependency, and much ofpandas
vectorized functionality directly corresponds tonumpy
.
import pandas as pd
import numpy as np
# test data
df = pd.DataFrame({'_item': ['book', 'book' , 'car', 'car', 'bike', 'bike'], 'color': ['green', 'blue' , 'red', 'green' , 'blue', 'red'], 'val' : [-22.7, -109.6, -57.19, -11.2, -25.6, -33.61]})
# Use pandas Boolean index to
selected = df[(df._item == 'book') & (df.color == 'blue')]
# print(selected)
_item color val
book blue -109.6
# Alternatively, create a recarray
v = df.to_records(index=False)
# display(v)
rec.array([('book', 'green', -22.7 ), ('book', 'blue', -109.6 ),
('car', 'red', -57.19), ('car', 'green', -11.2 ),
('bike', 'blue', -25.6 ), ('bike', 'red', -33.61)],
dtype=[('_item', 'O'), ('color', 'O'), ('val', '<f8')])
# search the recarray
selected = v[(v._item == 'book') & (v.color == 'blue')]
# print(selected)
[('book', 'blue', -109.6)]
Update in response to OP edit
- You must first reshape the dataframe using
pandas.DataFrame.pivot
, and then use the previously mentioned methods.
dfp = df.pivot(index='_item', columns='color', values='val')
# display(dfp)
color blue green red
_item
bike -25.6 NaN -33.61
book -109.6 -22.7 NaN
car NaN -11.2 -57.19
# create a numpy recarray
v = dfp.to_records(index=True)
# display(v)
rec.array([('bike', -25.6, nan, -33.61),
('book', -109.6, -22.7, nan),
('car', nan, -11.2, -57.19)],
dtype=[('_item', 'O'), ('blue', '<f8'), ('green', '<f8'), ('red', '<f8')])
# select data
selected = v.blue[(v._item == 'book')]
# print(selected)
array([-109.6])
Related Topics
How to Append One String to Another in Python
How to Convert Comma-Delimited String to List in Python
Seaborn Is Not Plotting Within Defined Subplots
Most Efficient Way of Making an If-Elif-Elif-Else Statement When the Else Is Done the Most
Filtering a List of Strings Based on Contents
How to Get the Position of a Character in Python
How to Udp Multicast in Python
Tkinter: Binding Mousewheel to Scrollbar
Get the Key Corresponding to the Minimum Value Within a Dictionary
How to Pass a Default Argument Value of an Instance Member to a Method
How to Run Python Code from Sublime Text 2
Most Recent Previous Business Day in Python
List VS Generator Comprehension Speed with Join Function
How to Check a String for Specific Characters
Python Round Up Integer to Next Hundred