How to add multiple columns to pandas dataframe in one assignment?
I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...
), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).
Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...
). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.
Here are several approaches that will work:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})
Then one of the following:
1) Three assignments in one, using list unpacking:
df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]
2) DataFrame
conveniently expands a single row to match the index, so you can do this:
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
3) Make a temporary data frame with new columns, then combine with the original data frame later:
df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)
4) Similar to the previous, but using join
instead of concat
(may be less efficient):
df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))
5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):
df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))
6) Use .assign()
with multiple column arguments.
I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:
df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:
new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols
8) In the end it's hard to beat three separate assignments:
df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3
Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame
Is it possible to add several columns at once to a pandas DataFrame?
Pandas has assign
method since 0.16.0
. You could use it on dataframes like
In [1506]: df1.assign(**df2)
Out[1506]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
or, you could directly use the dictionary like
In [1507]: df1.assign(**additional_data)
Out[1507]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
Add multiple empty columns to pandas DataFrame
I'd concat
using a DataFrame:
In [23]:
df = pd.DataFrame(columns=['A'])
df
Out[23]:
Empty DataFrame
Columns: [A]
Index: []
In [24]:
pd.concat([df,pd.DataFrame(columns=list('BCD'))])
Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []
So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.
Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex
may be preferable where performance is critical.
How to add multiple columns to a dataframe based on calculations
import pandas as pd
data = pd.Series(dataframe.apply(lambda x: [function1(x[column_name]), function2(x[column_name)], function3(x[column_name])], axis = 1))
pd.DataFrame(data.tolist(),data.index)
if i understood your mean correctly, it's your answer. but before everything please use Swifter pip :)
first create a series by lists and convert it to columns...
swifter is a simple library (at least i think it is simple) that only has only one useful method: apply
import swifter
data.swifter.apply(lambda x: x+1)
it use parallel manner to improve speed in large datasets... in small ones, it isn't good and even is worse
https://pypi.org/project/swifter/
Add multiple columns to DataFrame and set them equal to an existing column
you can use .assign() method:
In [31]: df.assign(b=df['a'], c=df['a'])
Out[31]:
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
or a little bit more creative approach:
In [41]: cols = list('bcdefg')
In [42]: df.assign(**{col:df['a'] for col in cols})
Out[42]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5
another solution:
In [60]: pd.DataFrame(np.repeat(df.values, len(cols)+1, axis=1), columns=['a']+cols)
Out[60]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5
NOTE: as @Cpt_Jauchefuerst mentioned in the comment DataFrame.assign(z=1, a=1)
will add columns in alphabetical order - i.e. first a
will be added to existing columns and then z
.
add multiple columns programmatically from individual criteria/rules
The solution you provided doesn't work because you are using the rule
element as the new dataframe column for the rule. I would solve it like this:
rules = [rule_1, rule_2, rule_3]
for i, rule in enumerate(rules):
df1[f'rule_{i+1}'] = np.where(rule, 1, 0)
How to add multiple columns to dataframe by function
Create a dictionary:
dictionary= { 'c':'', 'd':-1 }
def new_columns(df, dictionary):
return df.assign(**dictionary)
then call it with your df:
df = new_columns(df, dictionary)
or just ( if you don't need a function call, not sure what your use case is) :
df.assign(**dictionary)
(Pandas) Set values for multiple columns at once using apply
You could make a second DataFrame of your apply and concat the two column wise. This will create the new columns to your original dataframe
def calculate(row):
return 'column1_value', 'column2_value', 'column3_value'
df2 = pd.DataFrame(list(df.apply(calculate, axis=1)), columns=['column1', 'column2', 'column3'])
df = pd.concat([df, df2], axis=1)
Related Topics
How to Make Ball Bounce Off Wall with Pygame
How to Pass a List as a Command-Line Argument with Argparse
How to Do Parallel Programming in Python
How to Make an Immutable Object in Python
How to "Test" Nonetype in Python
Error: Command 'Gcc' Failed with Exit Status 1 While Installing Eventlet
Combine Several Images Horizontally with Python
Python Date String to Date Object
How to Find Script's Directory
How to Check Type of Files Without Extensions
Pandas Groupby Mean - into a Dataframe
Add Text to Existing PDF Using Python
How to Run an External Command Asynchronously from Python
How to Include Third Party Python Libraries in Google App Engine