Add Numpy Array as Column to Pandas Data Frame

Add numpy array as column to Pandas data frame

import numpy as np
import pandas as pd
import scipy.sparse as sparse

df = pd.DataFrame(np.arange(1,10).reshape(3,3))
arr = sparse.coo_matrix(([1,1,1], ([0,1,2], [1,2,0])), shape=(3,3))
df['newcol'] = arr.toarray().tolist()
print(df)

yields

   0  1  2     newcol
0 1 2 3 [0, 1, 0]
1 4 5 6 [0, 0, 1]
2 7 8 9 [1, 0, 0]

How to add numpy matrix as new columns for pandas dataframe?

You can turn the matrix into a datframe and use concat with axis=1:

For example, given a dataframe df and a numpy array mat:

>>> df
a b
0 5 5
1 0 7
2 1 0
3 0 4
4 6 4

>>> mat
array([[0.44926098, 0.29567859, 0.60728561],
[0.32180566, 0.32499134, 0.94950085],
[0.64958125, 0.00566706, 0.56473627],
[0.17357589, 0.71053224, 0.17854188],
[0.38348102, 0.12440952, 0.90359566]])

You can do:

>>> pd.concat([df, pd.DataFrame(mat)], axis=1)
a b 0 1 2
0 5 5 0.449261 0.295679 0.607286
1 0 7 0.321806 0.324991 0.949501
2 1 0 0.649581 0.005667 0.564736
3 0 4 0.173576 0.710532 0.178542
4 6 4 0.383481 0.124410 0.903596

How to make a new column of numpy arrays in a pandas data frame?

Specify the datatype as "object" while creating the new column and then insert the elements as needed:

df["new"] = pd.Series(dtype="object")
df.at[1, 'new'] = [2 , 'l']
>>> df
id a b new
0 1 on on NaN
1 2 on off [2, l]
2 3 off on NaN
3 4 off off NaN

How to add numpy array elements row-wise to a pandas dataframe?

Creating an array for the problem, and convert this to a list.

a = np.array([[ 0.00021284, -0.04443965,  0.03926146,  0.04830161,
-0.11913304, 0.03370821],
[ 0.01778569, -0.05192029, -0.00792321, -0.01799901,
-0.09819183, 0.06020728],
[-0.00748426, -0.02401578, 0.01762747, 0.09334017,
-0.11837556, 0.00603597],
[-0.03505319, -0.01932572, -0.03248611, 0.00356432,
-0.082398 , 0.03887841],
[-0.05111802, -0.0309066 , 0.03542011, -0.01343899,
-0.10434885, -0.0315006 ]]).tolist()

Results in:

print(a)

[[0.00021284, -0.04443965, 0.03926146, 0.04830161, -0.11913304, 0.03370821], [0.01778569, -0.05192029, -0.00792321, -0.01799901, -0.09819183, 0.06020728], [-0.00748426, -0.02401578, 0.01762747, 0.09334017, -0.11837556, 0.00603597], [-0.03505319, -0.01932572, -0.03248611, 0.00356432, -0.082398, 0.03887841], [-0.05111802, -0.0309066, 0.03542011, -0.01343899, -0.10434885, -0.0315006]]

Then add the list to the dataframe.

df = pd.DataFrame({"Message": [
"How are you?",
"What is your name?",
"What do you do?",
"What is your address?",
"Let's hang out?"]})
df['Array'] = a
print(df)

For:

                Message                                              Array
0 How are you? [0.00021284, -0.04443965, 0.03926146, 0.048301...
1 What is your name? [0.01778569, -0.05192029, -0.00792321, -0.0179...
2 What do you do? [-0.00748426, -0.02401578, 0.01762747, 0.09334...
3 What is your address? [-0.03505319, -0.01932572, -0.03248611, 0.0035...
4 Let's hang out? [-0.05111802, -0.0309066, 0.03542011, -0.01343...

To create everything at the beginning, you can use dictionary:

df = pd.DataFrame({"Message": [
"How are you?",
"What is your name?",
"What do you do?",
"What is your address?",
"Let's hang out?"], "Array": a})

adding a new column to existing dataframe and fill with numpy array

Code from https://www.geeksforgeeks.org/adding-new-column-to-existing-dataframe-in-pandas/
Import pandas package
import pandas as pd
Define a dictionary containing data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Height': [5.1, 6.2, 5.1, 5.2]
}

Convert the dictionary into DataFrame
original_df = pd.DataFrame(data)
Using 'Qualification' as the column name and equating it to the list
altered_df = original_df.assign(Qualification = ['Msc', 'MA', 'Msc', 'Msc'])
Observe the result
altered_df

How to add numpy array values to dataframe at a certain index?

For now another problem is that you also erase all others values of the column, you may not set a DataFrame but just the array as the new value.

To set values in a column, at specific index, use df.loc[df.index[#], 'NAME']

import numpy as np
import pandas as pd

df = pd.DataFrame([[1, 2] for _ in range(100)], columns=['column_1', 'column_2'])
my_array = np.array([41892.79355875, 40239.97933262, 39466.32169404, 38416.39545664,
40012.3803004, 41135.45946026, 43084.18917943, 44825.08405799,
44066.70603561, 46636.34415037, 45855.25783352, 45863.87118957,
44697.45547342, 48065.5708295, 47931.83508874])

df.loc[df.index[-15:], 'column_2'] = my_array

print(df)
    column_1      column_2
0 1 2.000000
1 1 2.000000
2 1 2.000000
3 1 2.000000
4 1 2.000000
.. ... ...
95 1 45855.257834
96 1 45863.871190
97 1 44697.455473
98 1 48065.570830
99 1 47931.835089

python - how to append numpy array to a pandas dataframe

Assign the predictions to a variable and then extract the columns from the variable to be assigned to the pandas dataframe cols. If x is the 2D numpy array with predictions,

x = sentiment_model.predict_proba(test_matrix)

then you can do,

test_data['prediction0'] = x[:,0]
test_data['prediction1'] = x[:,1]

Dynamically store data in the columns of pandas dataframe from numpy arrays being generated from “for loop”

Use Numpy stack over the dictionary values (this will give you a Numpy array with shape (10, 241, 241)) then use reshape to modify the shape to (10,58081) follow by transpose, to place the days as columns. Next, convert to a Pandas dataframe and fix the column names using the dictionary keys.

import pandas as pd
import numpy as np

#setup
np.random.seed(12345)
df_dictionary = {}
days = {f'day_{d}': np.random.rand(241,241).round(2) for d in range(1,11)}
df_dictionary['arrays_to_iterate'] = days
print(df_dictionary)

#code
all_days = np.stack(list(df_dictionary['arrays_to_iterate'].values())).reshape(10, -1).T
df = pd.DataFrame(all_days)
df.columns = df_dictionary['arrays_to_iterate'].keys()

print(df)

Ouput from df_dictionary

{'arrays_to_iterate':
{'day_1':
array(
[[0.93, 0.32, 0.18, ..., 0.62, 0.89, 0.78],
[0.72, 0.31, 0.36, ..., 0.5 , 0.89, 0.38],
...,
[0.36, 0.62, 0.77, ..., 0.03, 0.57, 0.04],
[0.02, 0.07, 0.66, ..., 0.62, 0.5 , 0.04]]),
'day_2': array(
[[0.14, 0.13, 0.91, ..., 0.06, 0.72, 0.93],
[0.13, 0.02, 0.09, ..., 0.39, 0.72, 0.13],
...

Output from df

       day_1  day_2  day_3  day_4  day_5  day_6  day_7  day_8  day_9  day_10
0 0.93 0.14 0.06 0.10 0.01 0.66 0.67 0.18 0.93 0.40
1 0.32 0.13 0.81 0.57 0.23 0.60 0.48 0.07 0.08 0.32
2 0.18 0.91 0.95 0.27 0.36 0.11 0.25 0.71 0.24 0.44
3 0.20 0.51 0.52 0.62 0.09 0.31 0.19 0.78 0.83 0.58
4 0.57 0.14 0.89 0.51 0.67 0.29 0.48 0.95 0.36 0.97
... ... ... ... ... ... ... ... ... ... ...
58076 0.98 0.20 0.54 0.96 0.89 0.24 0.05 0.81 0.35 0.57
58077 0.53 0.96 0.04 0.60 0.16 0.38 0.83 0.49 0.28 0.02
58078 0.62 0.50 0.74 0.67 0.43 0.30 0.91 0.68 0.15 0.43
58079 0.50 0.11 0.57 0.42 0.85 0.97 0.86 0.60 0.75 0.33
58080 0.04 0.74 0.74 0.94 0.98 0.35 0.52 0.12 0.47 0.53

[58081 rows x 10 columns]


Related Topics



Leave a reply



Submit