Reshaping an Array to Data.Frame

reshape while converting Pandas Dataframe to numpy array

Should have tried the usual way. If there could be better solution than below

reshaped=[]
for l in X:
reshaped.append(l)

X_new=np.array(reshaped)
print(X_new.shape)
(4, 16)

How to reshape a pandas.Series

You can call reshape on the values array of the Series:

In [4]: a.values.reshape(2,2)
Out[4]:
array([[1, 2],
[3, 4]], dtype=int64)

I actually think it won't always make sense to apply reshape to a Series (do you ignore the index?), and that you're correct in thinking it's just numpy's reshape:

a.reshape?
Docstring: See numpy.ndarray.reshape

that said, I agree the fact that it let's you try to do this looks like a bug.

Reshape and transform a dataframe/array from 32 * 32 columns by 16 rows to (32 * 16) by 32

Here are three different functions, the first uses Pandas methods (stacking). The second uses regular python lists, building the result row by row. And the final one uses numpy reshaping.

The numpy reshaping method is twice as efficient as the others with almost all computation time actually being spent converting the DataFrame to numpy array format and then back to pandas.

Here's a link to the notebook I used for this if you want to play around with the code.

def stack_image_df(image_df):
"""
Performance: 100 loops, best of 5: 19 ms per loop
"""
# create a MultiIndex indicating Row and Column information for each image
row_col_index = pd.MultiIndex.from_tuples(
[(i // 32, i % 32) for i in range(0, 1024)], name=["row", "col"]
)
image_df.columns = row_col_index

image_df.index = range(1, 17)
image_df.index.name = "Image"

# Use MultiIndex to reshape data
return image_df.stack(level=1).T

def build_image_df(image_df):
"""
Performance: 10 loops, best of 5: 19.2 ms per loop
"""
image_data = image_df.values.tolist()
reshaped = []
for r_num in range(0, 32):
row = []
for image_num in range(0, 16):
# for each image
for c_num in range(0, 32):
# get the corresponding index in the raw data
# and add the pixel data to the row we're building
raw_index = r_num * 32 + c_num
pixel = image_data[image_num][raw_index]
row.append(pixel)
reshaped.append(row)
reshaped_df = pd.DataFrame(reshaped)
return reshaped_df

def reshape_image_df(image_df):
"""
Performance: 100 loops, best of 5: 9.56 ms per loop
Note: numpy methods only account for 0.82 ms of this

"""
return pd.DataFrame(
np.rot90(np.fliplr(raw_df.to_numpy().reshape(512, 32)))
)

Reshaping an array to data.frame

Yes, use adply():

adply(x, c(1,2,3))
Subject Cond Item Measure1 Measure2 Measure3
1 s1 A 1 -0.93 -0.360 -0.005
2 s2 A 1 0.39 1.043 1.090
3 s3 A 1 0.88 0.330 0.360
4 s4 A 1 0.63 -0.120 0.040
5 s5 A 1 0.86 -0.055 0.090
6 s1 B 1 -0.69 0.070 0.170
7 s2 B 1 1.02 0.670 0.680
8 s3 B 1 0.29 0.480 0.510
9 s4 B 1 0.94 0.002 0.090
10 s5 B 1 0.93 0.008 0.120
11 s1 A 2 -0.01 -0.190 -0.050
12 s2 A 2 0.79 -1.390 0.110
13 s3 A 2 0.32 0.980 0.990
14 s4 A 2 0.14 0.430 0.620
15 s5 A 2 0.13 -0.020 0.130
16 s1 B 2 -0.07 -0.150 0.060
17 s2 B 2 -0.63 -0.080 0.270
18 s3 B 2 0.26 0.740 0.740
19 s4 B 2 0.07 0.960 0.960
20 s5 B 2 0.87 0.440 0.450

Reshaping a 3D array to a 2D array to produce a DataFrame: keep track of indices to produce column names

Try this and see if it fits your use case:

Generate columns via a combination of np.indices, np.dstack and np.vstack :

columns = np.vstack(np.dstack(np.indices((nrow, ncol))))

array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 0],
[2, 1],
[2, 2],
[3, 0],
[3, 1],
[3, 2]])

Now convert to string via a combination of map, join and list comprehension:

columns = ["-".join(map(str, entry)) for entry in columns]
['0-0',
'0-1',
'0-2',
'1-0',
'1-1',
'1-2',
'2-0',
'2-1',
'2-2',
'3-0',
'3-1',
'3-2']

Let's know how it goes.

Reshape DataFrame to np.array

Make a 1 column frame:

In [590]: df = pd.DataFrame(np.arange(5), columns=['x'])                                             
In [591]: df
Out[591]:
x
0 0
1 1
2 2
3 3
4 4

The array from that is (5,1) shaped:

In [592]: df.values                                                                                  
Out[592]:
array([[0],
[1],
[2],
[3],
[4]])

One column is a Series, which is 1d:

In [594]: df['x']                                                                                    
Out[594]:
0 0
1 1
2 2
3 3
4 4
Name: x, dtype: int64
In [595]: df['x'].values
Out[595]: array([0, 1, 2, 3, 4])

But if you have the (5,1) shape array, there are lots of ways of reshaping it:

In [596]: df.values.ravel()                                                                          
Out[596]: array([0, 1, 2, 3, 4])

ravel, flatten, reshape, squeeze, even indexing. All these can be found in the basic numpy documentation.

How to solve error when reshaping DataFrame to LSTM

You code is returning an error message because, when you write train_data[0], you are getting the first line of the train_data 2-dimentional numpy array:

>>> df = pd.DataFrame([[.4,.6,.3], [.7,.8,.9]])
>>> df
0 1 2
0 0.4 0.6 0.3
1 0.7 0.8 0.9
>>> df = df.to_numpy()
>>> df[0]
array([0.4, 0.6, 0.3])

What you actually want is to use the dataframe's shape. Try this:

>>> df = df.to_numpy()
>>> df = df.reshape(1, df.shape[0], df.shape[1])
>>> df
array([[[0.4, 0.6, 0.3],
[0.7, 0.8, 0.9]]])
>>> df.shape
(1, 2, 3)

How to convert and reshape MultiIndex to 3D Numpy array?

It seems like np.swapaxes does the trick you need: arr.reshape(2,3,4,4).swapaxes(2,3).reshape(2,3,16)

The main idea is to swap the axes in the most inner data:

[ 1,  2,  3,  4,  1,  2,  3,  4,  1,  2,  3,  4,  1,  2,  3,  4] ->
[[ 1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]] ->
[ 1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3], [4, 4, 4, 4]] ->
[ 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4]


Related Topics



Leave a reply



Submit