How to add multiple columns to pandas dataframe in one assignment?
I would have expected your syntax to work too. The problem arises because when you create new columns with the column-list syntax (df[[new1, new2]] = ...
), pandas requires that the right hand side be a DataFrame (note that it doesn't actually matter if the columns of the DataFrame have the same names as the columns you are creating).
Your syntax works fine for assigning scalar values to existing columns, and pandas is also happy to assign scalar values to a new column using the single-column syntax (df[new1] = ...
). So the solution is either to convert this into several single-column assignments, or create a suitable DataFrame for the right-hand side.
Here are several approaches that will work:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'col_1': [0, 1, 2, 3],
'col_2': [4, 5, 6, 7]
})
Then one of the following:
1) Three assignments in one, using list unpacking:
df['column_new_1'], df['column_new_2'], df['column_new_3'] = [np.nan, 'dogs', 3]
2) DataFrame
conveniently expands a single row to match the index, so you can do this:
df[['column_new_1', 'column_new_2', 'column_new_3']] = pd.DataFrame([[np.nan, 'dogs', 3]], index=df.index)
3) Make a temporary data frame with new columns, then combine with the original data frame later:
df = pd.concat(
[
df,
pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
)
], axis=1
)
4) Similar to the previous, but using join
instead of concat
(may be less efficient):
df = df.join(pd.DataFrame(
[[np.nan, 'dogs', 3]],
index=df.index,
columns=['column_new_1', 'column_new_2', 'column_new_3']
))
5) Using a dict is a more "natural" way to create the new data frame than the previous two, but the new columns will be sorted alphabetically (at least before Python 3.6 or 3.7):
df = df.join(pd.DataFrame(
{
'column_new_1': np.nan,
'column_new_2': 'dogs',
'column_new_3': 3
}, index=df.index
))
6) Use .assign()
with multiple column arguments.
I like this variant on @zero's answer a lot, but like the previous one, the new columns will always be sorted alphabetically, at least with early versions of Python:
df = df.assign(column_new_1=np.nan, column_new_2='dogs', column_new_3=3)
7) This is interesting (based on https://stackoverflow.com/a/44951376/3830997), but I don't know when it would be worth the trouble:
new_cols = ['column_new_1', 'column_new_2', 'column_new_3']
new_vals = [np.nan, 'dogs', 3]
df = df.reindex(columns=df.columns.tolist() + new_cols) # add empty cols
df[new_cols] = new_vals # multi-column assignment works for existing cols
8) In the end it's hard to beat three separate assignments:
df['column_new_1'] = np.nan
df['column_new_2'] = 'dogs'
df['column_new_3'] = 3
Note: many of these options have already been covered in other answers: Add multiple columns to DataFrame and set them equal to an existing column, Is it possible to add several columns at once to a pandas DataFrame?, Add multiple empty columns to pandas DataFrame
How to add multiple columns to a data.frame in one go?
This will get you there:
ddf[xx] <- NA
# a b c d e f
#1 1 2 NA NA NA NA
#2 1 2 NA NA NA NA
#3 1 2 NA NA NA NA
#...
You can't directly use something like ddf$xx
because this will try to assign to a column called xx
rather than interpreting xx
. You need to use [
and [<-
functions, using the square brackets when you are dealing with a character string/vector - like ddf["columnname"]
or ddf[c("col1","col2")]
, or a stored vector like your ddf[xx]
.
The reason why it selects columns is because data.frames
are lists essentially:
is.list(ddf)
#[1] TRUE
as.list(ddf)
#$a
# [1] 1 1 1 1 1 1 1 1 1 1
#
#$b
# [1] 2 2 2 2 2 2 2 2 2 2
...with each column corresponding to a list entry. So if you don't use a comma to specify a row, like ddf["name",]
or a column like ddf[,"name"]
, you get the column by default.
In the case that you are working with a 0-row dataset, you can not use a value like NA
as the replacement. Instead, replace with list(character(0))
where character(0)
can be substituted for numeric(0)
, integer(0)
, logical(0)
etc, depending on the class you want for your new columns.
ddf <- data.frame(a=character())
xx <- c("c", "d", "e", "f")
ddf[xx] <- list(character(0))
ddf
#[1] a c d e f
#<0 rows> (or 0-length row.names)
Is it possible to add several columns at once to a pandas DataFrame?
Pandas has assign
method since 0.16.0
. You could use it on dataframes like
In [1506]: df1.assign(**df2)
Out[1506]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
or, you could directly use the dictionary like
In [1507]: df1.assign(**additional_data)
Out[1507]:
col_1 col_2 col_3 col_4
0 0 4 8 12
1 1 5 9 13
2 2 6 10 14
3 3 7 11 15
how to add multiple columns from one data frame to another based on values in another column?
We can use left_join
if we want to match the 'x2' from 'df1' and 'fips' from 'df2'
library(dplyr)
df2 <- left_join(df2, df1 %>%
select(x2:last_col()), by = c("fips" = "x2"))
-output
df2
fips county_name x21_40 x41_60 x61_80 x81_100
1 5000 a 0 1 0 0
2 5001 b 0 0 1 0
3 5002 c 1 0 0 0
4 5003 d 0 0 0 1
In case of duplicates in 'df1', get the max
value for those columns grouped by 'fips/x2' and then do the join
df1 %>%
group_by(fips = x2) %>%
summarise(across(x21_40:x81_100, max, na.rm = TRUE),
.groups = "drop") %>%
left_join(df2, .)
Add multiple empty columns to pandas DataFrame
I'd concat
using a DataFrame:
In [23]:
df = pd.DataFrame(columns=['A'])
df
Out[23]:
Empty DataFrame
Columns: [A]
Index: []
In [24]:
pd.concat([df,pd.DataFrame(columns=list('BCD'))])
Out[24]:
Empty DataFrame
Columns: [A, B, C, D]
Index: []
So by passing a list containing your original df, and a new one with the columns you wish to add, this will return a new df with the additional columns.
Caveat: See the discussion of performance in the other answers and/or the comment discussions. reindex
may be preferable where performance is critical.
How to add multiple columns to existing data frame at once?
Well, it's hard to give an answer that's not simply "that's because that's how Python syntax is defined". The unpacking you're doing allows you to perform operations like the following:
In [63]: a, b = 3, 5
In [64]: a
Out[64]: 3
In [65]: b
Out[65]: 5
In [66]: l = [8, 10]
In [67]: c, d = l
In [69]: c
Out[69]: 8
In [70]: d
Out[70]: 10
That is, the element on the right hand side is unpacked into the appropriate number of variables on the left hand side. Knowing this, it's clear that you need three elements on the right hand side in your case.
Now what you can do is the following, which perhaps maps more closely to your mental model:
df['Hour'] = df['Month'] = df['Day'] = ''
Pandas: adding several columns to a dataframe in a single line
There is assign
:
df.assign(b=range(11,21), c=range(21,31), d=range(31,41))
Things are even easier when you have a dictionary:
# assume you get this from somewhere else
val_dict = {'b': range(11,21), 'c':range(21,31)}
df.assign(**val_dict)
Note the second approach is expected when b
is not a possible choice for keyword arguments, for example, having spaces 'a b'
.
Adding multiple columns in between columns in a data frame using a For Loop
You do not need to loop to do this:
as.data.frame(cbind(df, matrix(0, nrow = nrow(df), ncol = 53)))
Store.No Task Third Fourth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 1 70 4 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 5 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 6 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
matrix
will create a matrix with 53 columns and 3 rows filled with0
cbind
will add this matrix to the end of your dataas.data.frame
will convert it to a dataframe
Update
To insert these zero columns positionally you can subset your df
into two parts: df[, 1:2]
are the first and second columns, while df[,3:ncol(df)]
are the third to end of your dataframe.
as.data.frame(cbind(df[,1:2], matrix(0, nrow = nrow(df), ncol = 53), df[,3:ncol(df)))
Store.No Task 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 1 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 Third Fourth
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 7
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 8
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 9
add_column
Alternatively you can use the add_column
function from the tibble
package as you were in your post using the .after
argument to insert after the second column:
library(tibble)
tibble::add_column(df, as.data.frame(matrix(0, nrow = nrow(df), ncol = 53)), .after = 2)
Note: this function will fix the column names to add a "V" before any column name that starts with a number. So 1
will become V1
.
Data
df <- data.frame(Store.No = 1:3,
Task = c(70, 50, 20),
Third = 4:6,
Fourth = 7:9)
How to add multiple columns to a dataframe based on calculations
import pandas as pd
data = pd.Series(dataframe.apply(lambda x: [function1(x[column_name]), function2(x[column_name)], function3(x[column_name])], axis = 1))
pd.DataFrame(data.tolist(),data.index)
if i understood your mean correctly, it's your answer. but before everything please use Swifter pip :)
first create a series by lists and convert it to columns...
swifter is a simple library (at least i think it is simple) that only has only one useful method: apply
import swifter
data.swifter.apply(lambda x: x+1)
it use parallel manner to improve speed in large datasets... in small ones, it isn't good and even is worse
https://pypi.org/project/swifter/
Add multiple columns to DataFrame and set them equal to an existing column
you can use .assign() method:
In [31]: df.assign(b=df['a'], c=df['a'])
Out[31]:
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
or a little bit more creative approach:
In [41]: cols = list('bcdefg')
In [42]: df.assign(**{col:df['a'] for col in cols})
Out[42]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5
another solution:
In [60]: pd.DataFrame(np.repeat(df.values, len(cols)+1, axis=1), columns=['a']+cols)
Out[60]:
a b c d e f g
0 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5
NOTE: as @Cpt_Jauchefuerst mentioned in the comment DataFrame.assign(z=1, a=1)
will add columns in alphabetical order - i.e. first a
will be added to existing columns and then z
.
Related Topics
Avoid String Printed to Console Getting Truncated (In Rstudio)
Equivalent to Unix "Less" Command Within R Console
Finding 2 & 3 Word Phrases Using R Tm Package
Using Dynamic Column Names in 'Data.Table'
Delete "" from CSV Values and Change Column Names When Writing to a CSV
Create Frequency Tables for Multiple Factor Columns in R
Adding New Columns to a Data.Table By-Reference Within a Function Not Always Working
Generate Paired Stacked Bar Charts in Ggplot (Using Position_Dodge Only on Some Variables)
Directly Creating Dummy Variable Set in a Sparse Matrix in R
Data.Frame Without Ruining Column Names
Replace Values in a Vector Based on Another Vector
Group Integer Vector into Consecutive Runs
R Function Not Returning Values
Add Max Value to a New Column in R
Rgdal Installation Failed on Ubuntu 16.04
Duplicating (And Modifying) Discrete Axis in Ggplot2
R: How to Filter/Subset a Sequence of Dates
Getting the Last N Elements of a Vector. Is There a Better Way Than Using the Length() Function