Merge Two Data Frames While Keeping the Original Row Order

Merge two data frames while keeping the original row order

Check out the join function in the plyr package. It's like merge, but it allows you to keep the row order of one of the data sets. Overall, it's more flexible than merge.

Using your example data, we would use join like this:

> join(df.2,df.1)
Joining by: class
object class prob
1 A 2 0.7
2 B 1 0.5
3 D 2 0.7
4 F 3 0.3
5 C 1 0.5

Here are a couple of links describing fixes to the merge function for keeping the row order:

http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/

http://r.789695.n4.nabble.com/patching-merge-to-allow-the-user-to-keep-the-order-of-one-of-the-two-data-frame-objects-merged-td4296561.html

In R, merge 2 dataframes while maintaining the row order of the first dataframe

You could use join from plyr

library(plyr)
plyr::join(df1,df2, by='global.player.id')

The result is not sorted.

Join/merge dataframes and preserve the row-order

One quick way is:

df_2=df_2.set_index(['A','B'])

temp = df_1.set_index(['A','B'])

df_2.update(temp)

df_2.reset_index(inplace=True)

As I discuss above with @jezrael above and if I am not missing something, if you do not need both the columns C from the original dataframes and you need only the column C with the matching values then .update() is the quickest way since you do not have to drop the columns that you do not need.

merge two DataFrame with two columns and keep the same order with original indexes in the result

when constructing the merged dataframe, get the index values from each dataframe.

merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])

use combine_first to combine index_x & index_y

merged_df['combined_index'] =merged_df.index_x.combine_first(merged_df.index_y)

sort using combined_index & index_x dropping columns which are not needed & resetting index.

output = merged_df.sort_values(
['combined_index', 'index_x']
).drop(
['index_x', 'index_y', 'combined_index'], axis=1
).reset_index(drop=True)

This results in the following output:

  key1 key2  Value1  Value2
0 K a5 apple NaN
1 K a9 NaN apple
2 K a4 guava NaN
3 A1 a7 kiwi kiwi
4 A3 a9 NaN grape
5 A2 a9 grape NaN
6 B1 b2 banana banana
7 C2 c7 NaN guava
8 B9 b8 peach NaN
9 C3 c1 berry orange

How can I merge and maintain the row order of one input?

You can do this with match and subsetting key by the result:

bottles <- key[match(samp, key$num),]
# rownames are odd because they must be unique, clean them up
rownames(bottles) <- seq(NROW(bottles))

Merge data.tables while keeping original order in R

Solution using dplyr:

library(data.table)

set.seed(100)

dt <- data.table(g1=c("A", "B", "C", "D", "E", "F", "L", "O", "P", "J"),
g2=c("G", "D", "C", "H", "K", "J", "L", "U", "I", "R"),
value= rnorm(10))

ids <- data.table(labels=c("A", "B", "C", "D", "E", "F", "L", "O",
"P", "J", "G", "H", "K", "U", "I", "R"),
ids=c(1:16))

dt %>%
left_join(ids, by= c("g1"="labels")) %>%
mutate(label_match = g1 == g2)

Which returns:

    g1 g2      value ids label_match
1 A G -0.50219235 1 FALSE
2 B D 0.13153117 2 FALSE
3 C C -0.07891709 3 TRUE
4 D H 0.88678481 4 FALSE
5 E K 0.11697127 5 FALSE
6 F J 0.31863009 6 FALSE
7 L L -0.58179068 7 TRUE
8 O U 0.71453271 8 FALSE
9 P I -0.82525943 9 FALSE
10 J R -0.35986213 10 FALSE

Merge two data frames while keeping a certain row

UPDATE:

In [139]: df[df.ColumnA.isin(df1.ColumnB)].append(df.loc['row_to_keep'])
Out[139]:
ColumnA Stats
0 Cake 872
1 Cheese Cake 912
3 Raspberry Jam 91
4 Bacon 123
row_to_keep NaN 999

Old answer:

Here is one solution:

In [126]: df.merge(df1, left_on="ColumnA", right_on="ColumnB").append(df.loc['row_to_keep'])
Out[126]:
ColumnA Stats ColumnB
0 Cake 872 Cake
1 Cheese Cake 912 Cheese Cake
2 Raspberry Jam 91 Raspberry Jam
3 Bacon 123 Bacon
row_to_keep NaN 999 NaN

Explanation:

df.loc['row_to_keep'] selects one row by index value ('row_to_keep') and DF.append(row) - appends it to the merged DF

I must admit though, there might be less ugly solutions...

Merge data frames while keeping length of one and values of other in R

We can use match to find the positions of the row names of Y that are found in X. The values of Y are put into a vector and concatenated with 0. We use the nomatch argument to fill in 0 when there is no match. This returns z as a vector:

Z <- c(unlist(Y, use.names=FALSE), 0)[match(row.names(X), row.names(Y), nomatch=4L)]
Z
[1] 0 0 0 20 0 30 0 40 0 0

To get a data.frame

Z <- data.frame(Z)

Match 2 data frames based on common rows, and preserving the order of rownames

With data.table, you can do this:

library(data.table)
setDT(df2)[setDT(df1),,on="b"][is.na(a), a:=0][]

Output:

    a   b
1: 5 Ccd
2: 9 Kkl
3: 13 Sop
4: 0 Mnn
5: 5 Msg
6: 0 Xxy
7: 0 Zxz
8: 5 Ccd
9: 5 Msg

Or with dplyr:

library(dplyr)
left_join(df1,df2, by="b") %>% mutate(a=if_else(is.na(a),0,as.double(a)))

Output:

     b  a
1: Ccd 5
2: Kkl 9
3: Sop 13
4: Mnn 0
5: Msg 5
6: Xxy 0
7: Zxz 0
8: Ccd 5
9: Msg 5

Input:

df1 <- structure(list(b = c("Ccd", "Kkl", "Sop", "Mnn", "Msg", "Xxy", 
"Zxz", "Ccd", "Msg")), row.names = c(NA, -9L), class = "data.frame")

df2 <- structure(list(a = c(3L, 5L, 5L, 9L, 5L, 13L, 19L), b = c("Ab",
"Abc", "Ccd", "Kkl", "Msg", "Sop", "Klj")), row.names = c(NA,
-7L), class = "data.frame")

Merge nth elements from two columns while keeping the original row order in R

We could do it with an ifelse statement checking if row is even or odd with the modulo operator %%:

library(dplyr)
df %>%
mutate(col3 = ifelse((row_number() %% 2) == 0, col2, col1))
  col1 col2 col3
1 A 2 A
2 B 1 1
3 D 2 D
4 F 3 3
5 C 1 C


Related Topics



Leave a reply



Submit