Row-Wise Sort Then Concatenate Across Specific Columns of Data Frame

Row-wise sort then concatenate across specific columns of data frame

My first thought would've been to do this:

dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]

But you could make your function work with a couple of simple modifications:

f = function(...) paste(c(...)[order(c(...))],collapse=", ")

dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]

Concatenate row-wise across specific columns of dataframe

Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep=""))

Concat two data frame row wise which have different column names and different values in row

IIUC, you want to combine dfs and want to keep values together in some list of sort, for that you can do:

pd.concat([df,df2]).reset_index().groupby('index').agg(list).reset_index(drop=True)

        a           b       c           y
0  [1, 4]  [2.0, nan]  [3, 6]  [nan, 5.0]

OR, if you just want to combine them then, pd.concat does it

pd.concat([df,df2]).reset_index(drop=True)

   a    b  c    y
0  1  2.0  3  NaN
1  4  NaN  6  5.0

How to sort ascending row-wise in Pandas Dataframe

You can sorting rows by numpy.sort, swap ordering for descending order by [:, ::-1] and pass to DataFrame constructor if performance is important:

df = pd.DataFrame(np.sort(df, axis=1)[:, ::-1], 
                  columns=df.columns, 
                  index=df.index)
print (df)
      N1  N2  N3  N4  N5
0     48  45  21  20  12
1     41  36  32  29  16
2     42  41  34  13   9
3     39  37  33   7   4
4     39  32  21   3   1
1313  42  36  27   5   1
1314  48  38  35  20  18
1315  42  38  37  34  12
1316  42  41  37  23  18
1317  35  34  18  10   2

A bit worse performance if assign back:

df[:] = np.sort(df, axis=1)[:, ::-1]

Performance:

#10k rows
df = pd.concat([df] * 1000, ignore_index=True)

#Ynjxsjmh sol
In [200]: %timeit df.apply(lambda row: list(reversed(sorted(row))), axis=1, result_type='expand')
595 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

#Andrej Kesely sol1
In [201]: %timeit df[:] = np.fliplr(np.sort(df, axis=1))
559 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#Andrej Kesely sol2
In [202]: %timeit df.loc[:, ::-1] = np.sort(df, axis=1)
518 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#jezrael sol2
In [203]: %timeit df[:] = np.sort(df, axis=1)[:, ::-1]
491 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

#jezrael sol1
In [204]: %timeit pd.DataFrame(np.sort(df, axis=1)[:, ::-1], columns=df.columns, index=df.index)
399 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sort each row individually between two columns

You can use:

df[['column_01','column_02']] = 
df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1)
print (df)
   column_01 column_02  value
0       aaa       ccc      1
1       bbb       ddd     34
2       aaa       ddd     98

Another solutions:

df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), 
                                 index=df.index, columns=['column_01','column_02'])

only with numpy array:

df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
print (df)
  column_01 column_02  value
0       aaa       ccc      1
1       bbb       ddd     34
2       aaa       ddd     98

Second solution is faster, because apply use loops:

df = pd.concat([df]*1000).reset_index(drop=True)
In [177]: %timeit df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), index=df.index, columns=['column_01','column_02'])
1000 loops, best of 3: 1.36 ms per loop

In [182]: %timeit df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
1000 loops, best of 3: 1.54 ms per loop

In [178]: %timeit df[['column_01','column_02']] = (df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1))
1 loop, best of 3: 291 ms per loop

concatenate values across columns in data.table, row by row

You can use do.call(), using .SDcols to supply the columns.

x[, key_ := do.call(paste, c(.SD, sep = "_")), .SDcols = names(x)]

.SDcols = names(x) supplies all the columns of x. You can supply any vector of names or column numbers there.

Sort columns values based on floats inside a string, then concat

You can try apply a customized function

def concat(row):
    keys = row.str.extract('(\d+\.?\d*)%')[0].astype(float).tolist()
    row = [x for _, x in sorted(zip(keys, row.tolist()))]
    return ' '.join(row)

df['c'] = df.apply(concat, axis=1)

print(df)

                                                  a                                                  b
0  some text (other text) : 56.3% (text again: 40%)  again text (not same text) : 33% (text text: 6...
1      text (always text) : 26.6% (aaand text: 80%)  still text (too much text) : 86% (last text: 10%)
                                                  a  \
0  some text (other text) : 56.3% (text again: 40%)
1      text (always text) : 26.6% (aaand text: 80%)

                                                     b  \
0  again text (not same text) : 33% (text text: 60.1%)
1    still text (too much text) : 86% (last text: 10%)

                                                                                                      c
0  again text (not same text) : 33% (text text: 60.1%) some text (other text) : 56.3% (text again: 40%)
1        text (always text) : 26.6% (aaand text: 80%) still text (too much text) : 86% (last text: 10%)

Keep duplicated row from a specific data frame after append

new = (pd.concat([df1, df2])
           .drop_duplicates(subset=["ID", "City", "Year"],
                            keep="last",
                            ignore_index=True))

append will be gone in near future, use pd.concat there please. Then drop_duplicates over the said columns while keep="last":

In [376]: df1
Out[376]:
   ID    City  Year  Number
0   7  Berlin  2012      62
1   2   Paris  2000      43

In [377]: df2
Out[377]:
   ID    City  Year  Number
0   7  Berlin  2012      60
1   5  London  2019     100

In [378]: (pd.concat([df1, df2])
     ...:     .drop_duplicates(subset=["ID", "City", "Year"],
     ...:                      keep="last",
     ...:                      ignore_index=True))
Out[378]:
   ID    City  Year  Number
0   2   Paris  2000      43
1   7  Berlin  2012      60
2   5  London  2019     100

ignore_index makes it again 0, 1, 2 after drop_duplicates disturbs it

Row-Wise Sort Then Concatenate Across Specific Columns of Data Frame