Row-wise sort then concatenate across specific columns of data frame
My first thought would've been to do this:
dt[, new_var := paste(sort(.SD), collapse = ", "), by = 1:nrow(dt)]
But you could make your function work with a couple of simple modifications:
f = function(...) paste(c(...)[order(c(...))],collapse=", ")
dt[, new_var := do.call(function(...) mapply(f, ...), .SD)]
Concatenate row-wise across specific columns of dataframe
Try
data$id <- paste(data$F, data$E, data$D, data$C, sep="_")
instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.
Edit Even better is
data <- within(data, id <- paste(F, E, D, C, sep=""))
Concat two data frame row wise which have different column names and different values in row
IIUC, you want to combine dfs and want to keep values together in some list of sort, for that you can do:
pd.concat([df,df2]).reset_index().groupby('index').agg(list).reset_index(drop=True)
a b c y
0 [1, 4] [2.0, nan] [3, 6] [nan, 5.0]
OR, if you just want to combine them then, pd.concat
does it
pd.concat([df,df2]).reset_index(drop=True)
a b c y
0 1 2.0 3 NaN
1 4 NaN 6 5.0
How to sort ascending row-wise in Pandas Dataframe
You can sorting rows by numpy.sort
, swap ordering for descending order by [:, ::-1]
and pass to DataFrame constructor if performance is important:
df = pd.DataFrame(np.sort(df, axis=1)[:, ::-1],
columns=df.columns,
index=df.index)
print (df)
N1 N2 N3 N4 N5
0 48 45 21 20 12
1 41 36 32 29 16
2 42 41 34 13 9
3 39 37 33 7 4
4 39 32 21 3 1
1313 42 36 27 5 1
1314 48 38 35 20 18
1315 42 38 37 34 12
1316 42 41 37 23 18
1317 35 34 18 10 2
A bit worse performance if assign back:
df[:] = np.sort(df, axis=1)[:, ::-1]
Performance:
#10k rows
df = pd.concat([df] * 1000, ignore_index=True)
#Ynjxsjmh sol
In [200]: %timeit df.apply(lambda row: list(reversed(sorted(row))), axis=1, result_type='expand')
595 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Andrej Kesely sol1
In [201]: %timeit df[:] = np.fliplr(np.sort(df, axis=1))
559 µs ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Andrej Kesely sol2
In [202]: %timeit df.loc[:, ::-1] = np.sort(df, axis=1)
518 µs ± 11 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol2
In [203]: %timeit df[:] = np.sort(df, axis=1)[:, ::-1]
491 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael sol1
In [204]: %timeit pd.DataFrame(np.sort(df, axis=1)[:, ::-1], columns=df.columns, index=df.index)
399 µs ± 2.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Sort each row individually between two columns
You can use:
df[['column_01','column_02']] =
df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1)
print (df)
column_01 column_02 value
0 aaa ccc 1
1 bbb ddd 34
2 aaa ddd 98
Another solutions:
df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values),
index=df.index, columns=['column_01','column_02'])
only with numpy array:
df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
print (df)
column_01 column_02 value
0 aaa ccc 1
1 bbb ddd 34
2 aaa ddd 98
Second solution is faster, because apply
use loops:
df = pd.concat([df]*1000).reset_index(drop=True)
In [177]: %timeit df[['column_01','column_02']] = pd.DataFrame(np.sort(df[['column_01','column_02']].values), index=df.index, columns=['column_01','column_02'])
1000 loops, best of 3: 1.36 ms per loop
In [182]: %timeit df[['column_01','column_02']] = np.sort(df[['column_01','column_02']].values)
1000 loops, best of 3: 1.54 ms per loop
In [178]: %timeit df[['column_01','column_02']] = (df[['column_01','column_02']].apply(lambda x: sorted(x.values), axis=1))
1 loop, best of 3: 291 ms per loop
concatenate values across columns in data.table, row by row
You can use do.call()
, using .SDcols
to supply the columns.
x[, key_ := do.call(paste, c(.SD, sep = "_")), .SDcols = names(x)]
.SDcols = names(x)
supplies all the columns of x
. You can supply any vector of names or column numbers there.
Sort columns values based on floats inside a string, then concat
You can try apply
a customized function
def concat(row):
keys = row.str.extract('(\d+\.?\d*)%')[0].astype(float).tolist()
row = [x for _, x in sorted(zip(keys, row.tolist()))]
return ' '.join(row)
df['c'] = df.apply(concat, axis=1)
print(df)
a b
0 some text (other text) : 56.3% (text again: 40%) again text (not same text) : 33% (text text: 6...
1 text (always text) : 26.6% (aaand text: 80%) still text (too much text) : 86% (last text: 10%)
a \
0 some text (other text) : 56.3% (text again: 40%)
1 text (always text) : 26.6% (aaand text: 80%)
b \
0 again text (not same text) : 33% (text text: 60.1%)
1 still text (too much text) : 86% (last text: 10%)
c
0 again text (not same text) : 33% (text text: 60.1%) some text (other text) : 56.3% (text again: 40%)
1 text (always text) : 26.6% (aaand text: 80%) still text (too much text) : 86% (last text: 10%)
Keep duplicated row from a specific data frame after append
new = (pd.concat([df1, df2])
.drop_duplicates(subset=["ID", "City", "Year"],
keep="last",
ignore_index=True))
append
will be gone in near future, use pd.concat
there please. Then drop_duplicates
over the said columns while keep="last"
:
In [376]: df1
Out[376]:
ID City Year Number
0 7 Berlin 2012 62
1 2 Paris 2000 43
In [377]: df2
Out[377]:
ID City Year Number
0 7 Berlin 2012 60
1 5 London 2019 100
In [378]: (pd.concat([df1, df2])
...: .drop_duplicates(subset=["ID", "City", "Year"],
...: keep="last",
...: ignore_index=True))
Out[378]:
ID City Year Number
0 2 Paris 2000 43
1 7 Berlin 2012 60
2 5 London 2019 100
ignore_index makes it again 0, 1, 2 after drop_duplicates
disturbs it
Related Topics
How to Convert Dd/Mm/Yy to Yyyy-Mm-Dd in R
Convert Integer as "20160119" to Different Columns of "Day" "Year" "Month"
Remove Space Between Bars Ggplot2
How to Get a Second Bibliography
Strange Formatting of Legend in Ggplotly in R
Submit Form with No Submit Button in Rvest
Ggplot2, Geom_Bar, Dodge, Order of Bars
Shiny Renderui Selectinput Returned Null
Read Observations in Fixed Width Files Spanning Multiple Lines in R
Calculate Percentage of Each Category in Each Group in R
How to Scrape/Automatically Download PDF Files from a Document Search Web Interface in R
How to Filter Data Without Losing Na Rows Using Dplyr
If {...} Else {...}:Does the Line Break Between "}" and "Else" Really Matters
How to Fit a Very Wide Grid.Table or Tablegrob to Fit on a PDF Page