Concatenate row-wise across specific columns of dataframe
Try
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
Edit Even better is
data <- within(data, id <- paste(F, E, D, C, sep=""))
Row-wise sort then concatenate across specific columns of data frame
My first thought would've been to do this:
dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]
But you could make your function work with a couple of simple modifications:
f = function(...) paste(c(...)[order(c(...))],collapse=", ")
dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]
Concatenate several columns across more than one row in pandas
Turning your input data into a csv file, I did the following, and it works well.
import pandas as pd
DF = pd.read_csv("CombinerData.csv")
print DF
print
def combine_Columns_Into_New_Column(DF, columns_To_Combine, new_Column_Name):
DF[new_Column_Name] = ''
for Col in columns_To_Combine:
DF[new_Column_Name] += DF[Col].map(str) + ' '
DF = DF.drop(columns_To_Combine, axis=1)
DF = DF.groupby(by=['Identifier']).sum()
return DF
DF = combine_Columns_Into_New_Column(DF, ['Op1','Op2','Op3'],'Ops')
print DF
OUTPUT:
Ops
Identifier
A str_1 str_2 str_3
B str_4 str_5 str_6 str_7 str_8 str_9 str_10 str...
C str_13 str_14 str_15 str_16 str_17 str_18
INPUT FILE:
Identifier,Op1,Op2,Op3
A,str_1,str_2,str_3
B,str_4,str_5,str_6
B,str_7,str_8,str_9
B,str_10,str_11,str_12
C,str_13,str_14,str_15
C,str_16,str_17,str_18
In R, concatenate numeric columns into a string while inserting some text elements
Well you can use the paste0 command that is part of the base R package, to concatenate strings in R. For example:
result <- paste0(d$estimate, " (", d$low_95, ", ", d$high_95, ")")
print(result)
[1] "380.3 (281.6, 405.7)"
Concatenate strings from several rows using Pandas groupby
You can groupby the 'name'
and 'month'
columns, then call transform
which will return data aligned to the original df and apply a lambda where we join
the text entries:
In [119]:
df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x))
df[['name','text','month']].drop_duplicates()
Out[119]:
name text month
0 name1 hej,du 11
2 name1 aj,oj 12
4 name2 fin,katt 11
6 name2 mycket,lite 12
I sub the original df by passing a list of the columns of interest df[['name','text','month']]
here and then call drop_duplicates
EDIT actually I can just call apply
and then reset_index
:
In [124]:
df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index()
Out[124]:
name month text
0 name1 11 hej,du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
update
the lambda
is unnecessary here:
In[38]:
df.groupby(['name','month'])['text'].apply(','.join).reset_index()
Out[38]:
name month text
0 name1 11 du
1 name1 12 aj,oj
2 name2 11 fin,katt
3 name2 12 mycket,lite
Concatenate all columns in a pandas dataframe
Solution with sum
, but output is float
, so convert to int
and str
is necessary:
df['new'] = df.sum(axis=1).astype(int).astype(str)
Another solution with apply
function join
, but it the slowiest:
df['new'] = df.apply(''.join, axis=1)
Last very fast numpy solution
- convert to numpy array
and then 'sum':
df['new'] = df.values.sum(axis=1)
Timings:
df = pd.DataFrame({'A': ['1', '2', '3'], 'B': ['4', '5', '6'], 'C': ['7', '8', '9']})
#[30000 rows x 3 columns]
df = pd.concat([df]*10000).reset_index(drop=True)
#print (df)
cols = list('ABC')
#not_a_robot solution
In [259]: %timeit df['concat'] = pd.Series(df[cols].fillna('').values.tolist()).str.join('')
100 loops, best of 3: 17.4 ms per loop
In [260]: %timeit df['new'] = df[cols].astype(str).apply(''.join, axis=1)
1 loop, best of 3: 386 ms per loop
In [261]: %timeit df['new1'] = df[cols].values.sum(axis=1)
100 loops, best of 3: 6.5 ms per loop
In [262]: %timeit df['new2'] = df[cols].astype(str).sum(axis=1).astype(int).astype(str)
10 loops, best of 3: 68.6 ms per loop
EDIT If dtypes of some columns are not object
(obviously string
s) cast by DataFrame.astype
:
df['new'] = df.astype(str).values.sum(axis=1)
Related Topics
Reasons For Using the Set.Seed Function
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
Limit Ggplot2 Axes Without Removing Data (Outside Limits): Zoom
Consistent Width For Geom_Bar in the Event of Missing Data
Select Equivalent Rows [A-B & B-A]
Workflow For Statistical Analysis and Report Writing
Construct a Manual Legend For a Complicated Plot
Create a Group Number For Each Consecutive Sequence
What Is the Width Argument in Position_Dodge
Merging Two Data Frames Using Fuzzy/Approximate String Matching in R
How to Change the Order of Facet Labels in Ggplot (Custom Facet Wrap Labels)
How to Print When Using %Dopar%
Repeat Rows of a Data.Frame N Times
Plotting Time-Series With Date Labels on X-Axis
Lattice: Multiple Plots in One Window