Concatenate Row-Wise Across Specific Columns of Dataframe

Concatenate row-wise across specific columns of dataframe

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep=""))

Row-wise sort then concatenate across specific columns of data frame

My first thought would've been to do this:

dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]

But you could make your function work with a couple of simple modifications:

f = function(...) paste(c(...)[order(c(...))],collapse=", ")

dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]

Concatenate several columns across more than one row in pandas

Turning your input data into a csv file, I did the following, and it works well.

import pandas as pd

DF = pd.read_csv("CombinerData.csv")

print DF
print

def combine_Columns_Into_New_Column(DF, columns_To_Combine, new_Column_Name):
DF[new_Column_Name] = ''
for Col in columns_To_Combine:
DF[new_Column_Name] += DF[Col].map(str) + ' '
DF = DF.drop(columns_To_Combine, axis=1)
DF = DF.groupby(by=['Identifier']).sum()

return DF

DF = combine_Columns_Into_New_Column(DF, ['Op1','Op2','Op3'],'Ops')

print DF

OUTPUT:

                                                          Ops
Identifier
A str_1 str_2 str_3
B str_4 str_5 str_6 str_7 str_8 str_9 str_10 str...
C str_13 str_14 str_15 str_16 str_17 str_18

INPUT FILE:

Identifier,Op1,Op2,Op3
A,str_1,str_2,str_3
B,str_4,str_5,str_6
B,str_7,str_8,str_9
B,str_10,str_11,str_12
C,str_13,str_14,str_15
C,str_16,str_17,str_18

In R, concatenate numeric columns into a string while inserting some text elements

Well you can use the paste0 command that is part of the base R package, to concatenate strings in R. For example:

result <- paste0(d$estimate, " (", d$low_95, ", ", d$high_95, ")")
print(result)

[1] "380.3 (281.6, 405.7)"

Concatenate strings from several rows using Pandas groupby

You can groupby the 'name' and 'month' columns, then call transform which will return data aligned to the original df and apply a lambda where we join the text entries:

In [119]:

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12

I sub the original df by passing a list of the columns of interest df[['name','text','month']] here and then call drop_duplicates

EDIT actually I can just call apply and then reset_index:

In [124]:

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()

Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

update

the lambda is unnecessary here:

In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()

Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite

Concatenate all columns in a pandas dataframe

Solution with sum, but output is float, so convert to int and str is necessary:

df['new'] = df.sum(axis=1).astype(int).astype(str)

Another solution with apply function join, but it the slowiest:

df['new'] = df.apply(''.join, axis=1)

Last very fast numpy solution - convert to numpy array and then 'sum':

df['new'] = df.values.sum(axis=1)

Timings:

df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})
#[30000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
#print (df)

cols = list('ABC')

#not_a_robot solution
In [259]: %timeit df['concat'] = pd.Series(df[cols].fillna('').values.tolist()).str.join('')
100 loops, best of 3: 17.4 ms per loop

In [260]: %timeit df['new'] = df[cols].astype(str).apply(''.join, axis=1)
1 loop, best of 3: 386 ms per loop

In [261]: %timeit df['new1'] = df[cols].values.sum(axis=1)
100 loops, best of 3: 6.5 ms per loop

In [262]: %timeit df['new2'] = df[cols].astype(str).sum(axis=1).astype(int).astype(str)
10 loops, best of 3: 68.6 ms per loop

EDIT If dtypes of some columns are not object (obviously strings) cast by DataFrame.astype:

df['new'] = df.astype(str).values.sum(axis=1)


Related Topics



Leave a reply



Submit