How to Add Multiple Columns to a Data.Frame in One Go

How to add multiple columns to pandas dataframe in one assignment?

I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).

Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.

Here are several approaches that will work:

import pandas as pd
import numpy as np

df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})

Then one of the following:

1) Three assignments in one, using list unpacking:

df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]

2) DataFrame conveniently expands a single row to match the index, so you can do this:

df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)

3) Make a temporary data frame with new columns, then combine with the original data frame later:

df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)

4) Similar to the previous, but using join instead of concat (may be less efficient):

df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))

5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):

df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))

6) Use .assign() with multiple column arguments.

I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:

df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)

7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:

new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols

8) In the end it's hard to beat three separate assignments:

df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3

Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame

How to add multiple columns to a data.frame in one go?

This will get you there:

ddf[xx] <- NA

# a b c d e f
#1 1 2 NA NA NA NA
#2 1 2 NA NA NA NA
#3 1 2 NA NA NA NA
#...

You can't directly use something like ddf$xx because this will try to assign to a column called xx rather than interpreting xx. You need to use [ and [<- functions, using the square brackets when you are dealing with a character string/vector - like ddf["columnname"] or ddf[c("col1","col2")], or a stored vector like your ddf[xx].

The reason why it selects columns is because data.frames are lists essentially:

is.list(ddf)
#[1] TRUE

as.list(ddf)
#$a
# [1] 1 1 1 1 1 1 1 1 1 1
#
#$b
# [1] 2 2 2 2 2 2 2 2 2 2

...with each column corresponding to a list entry. So if you don't use a comma to specify a row, like ddf["name",] or a column like ddf[,"name"], you get the column by default.


In the case that you are working with a 0-row dataset, you can not use a value like NA as the replacement. Instead, replace with list(character(0)) where character(0) can be substituted for numeric(0), integer(0), logical(0) etc, depending on the class you want for your new columns.

ddf <- data.frame(a=character())
xx <- c("c", "d", "e", "f")
ddf[xx] <- list(character(0))
ddf
#[1] a c d e f
#<0 rows> (or 0-length row.names)

Is it possible to add several columns at once to a pandas DataFrame?

Pandas has assign method since 0.16.0. You could use it on dataframes like

In [1506]: df1.assign(**df2)
Out[1506]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15

or, you could directly use the dictionary like

In [1507]: df1.assign(**additional_data)
Out[1507]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15

how to add multiple columns from one data frame to another based on values in another column?

We can use left_join if we want to match the 'x2' from 'df1' and 'fips' from 'df2'

library(dplyr)
df2 <- left_join(df2, df1 %>%
select(x2:last_col()), by = c("fips" = "x2"))

-output

df2
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1

In case of duplicates in 'df1', get the max value for those columns grouped by 'fips/x2' and then do the join

df1 %>% 
group_by(fips = x2) %>%
summarise(across(x21_40:x81_100, max, na.rm = TRUE),
.groups = "drop") %>%
left_join(df2, .)

Add multiple empty columns to pandas DataFrame

I'd concat using a DataFrame:

In [23]:
df = pd.DataFrame(columns=['A'])
df

Out[23]:
Empty DataFrame
Columns: [A]
Index: []

In [24]:
pd.concat([df,pd.DataFrame(columns=list('BCD'))])

Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []

So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.


Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex may be preferable where performance is critical.

How to add multiple columns to existing data frame at once?

Well, it's hard to give an answer that's not simply "that's because that's how Python syntax is defined". The unpacking you're doing allows you to perform operations like the following:

In [63]: a, b = 3, 5

In [64]: a
Out[64]: 3

In [65]: b
Out[65]: 5

In [66]: l = [8, 10]

In [67]: c, d = l

In [69]: c
Out[69]: 8

In [70]: d
Out[70]: 10

That is, the element on the right hand side is unpacked into the appropriate number of variables on the left hand side. Knowing this, it's clear that you need three elements on the right hand side in your case.

Now what you can do is the following, which perhaps maps more closely to your mental model:

 df['Hour'] = df['Month'] = df['Day'] = ''

Pandas: adding several columns to a dataframe in a single line

There is assign:

df.assign(b=range(11,21), c=range(21,31), d=range(31,41))

Things are even easier when you have a dictionary:

# assume you get this from somewhere else
val_dict = {'b': range(11,21), 'c':range(21,31)}

df.assign(**val_dict)

Note the second approach is expected when b is not a possible choice for keyword arguments, for example, having spaces 'a b'.

Adding multiple columns in between columns in a data frame using a For Loop

You do not need to loop to do this:

as.data.frame(cbind(df, matrix(0, nrow = nrow(df), ncol = 53)))

Store.No Task Third Fourth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 1 70 4 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 5 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 6 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  1. matrix will create a matrix with 53 columns and 3 rows filled with 0

  2. cbind will add this matrix to the end of your data

  3. as.data.frame will convert it to a dataframe

Update

To insert these zero columns positionally you can subset your df into two parts: df[, 1:2] are the first and second columns, while df[,3:ncol(df)] are the third to end of your dataframe.

as.data.frame(cbind(df[,1:2], matrix(0, nrow = nrow(df), ncol = 53), df[,3:ncol(df)))

Store.No Task 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 1 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 Third Fourth
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 7
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 8
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 9

add_column

Alternatively you can use the add_column function from the tibble package as you were in your post using the .after argument to insert after the second column:

library(tibble)

tibble::add_column(df, as.data.frame(matrix(0, nrow = nrow(df), ncol = 53)), .after = 2)

Note: this function will fix the column names to add a "V" before any column name that starts with a number. So 1 will become V1.


Data

df <- data.frame(Store.No = 1:3,
Task = c(70, 50, 20),
Third = 4:6,
Fourth = 7:9)

How to add multiple columns to a dataframe based on calculations


import pandas as pd
data = pd.Series(dataframe.apply(lambda x: [function1(x[column_name]), function2(x[column_name)], function3(x[column_name])], axis = 1))
pd.DataFrame(data.tolist(),data.index)

if i understood your mean correctly, it's your answer. but before everything please use Swifter pip :)
first create a series by lists and convert it to columns...

swifter is a simple library (at least i think it is simple) that only has only one useful method: apply

import swifter
data.swifter.apply(lambda x: x+1)

it use parallel manner to improve speed in large datasets... in small ones, it isn't good and even is worse

https://pypi.org/project/swifter/

Add multiple columns to DataFrame and set them equal to an existing column

you can use .assign() method:

In [31]: df.assign(b=df['a'], c=df['a'])
Out[31]:
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5

or a little bit more creative approach:

In [41]: cols = list('bcdefg')

In [42]: df.assign(**{col:df['a'] for col in cols})
Out[42]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5

another solution:

In [60]: pd.DataFrame(np.repeat(df.values, len(cols)+1, axis=1), columns=['a']+cols)
Out[60]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5

NOTE: as @Cpt_Jauchefuerst mentioned in the comment DataFrame.assign(z=1, a=1) will add columns in alphabetical order - i.e. first a will be added to existing columns and then z.



Related Topics



Leave a reply



Submit