Merge two data frames while keeping the original row order
Check out the join function in the plyr package. It's like merge, but it allows you to keep the row order of one of the data sets. Overall, it's more flexible than merge.
Using your example data, we would use join
like this:
> join(df.2,df.1)
Joining by: class
object class prob
1 A 2 0.7
2 B 1 0.5
3 D 2 0.7
4 F 3 0.3
5 C 1 0.5
Here are a couple of links describing fixes to the merge function for keeping the row order:
http://www.r-statistics.com/2012/01/merging-two-data-frame-objects-while-preserving-the-rows-order/
http://r.789695.n4.nabble.com/patching-merge-to-allow-the-user-to-keep-the-order-of-one-of-the-two-data-frame-objects-merged-td4296561.html
In R, merge 2 dataframes while maintaining the row order of the first dataframe
You could use join
from plyr
library(plyr)
plyr::join(df1,df2, by='global.player.id')
The result is not sorted.
Join/merge dataframes and preserve the row-order
One quick way is:
df_2=df_2.set_index(['A','B'])
temp = df_1.set_index(['A','B'])
df_2.update(temp)
df_2.reset_index(inplace=True)
As I discuss above with @jezrael above and if I am not missing something, if you do not need both the columns C
from the original dataframes and you need only the column C
with the matching values then .update()
is the quickest way since you do not have to drop the columns that you do not need.
merge two DataFrame with two columns and keep the same order with original indexes in the result
when constructing the merged dataframe, get the index values from each dataframe.
merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])
use combine_first
to combine index_x
& index_y
merged_df['combined_index'] =merged_df.index_x.combine_first(merged_df.index_y)
sort using combined_index
& index_x
dropping columns which are not needed & resetting index.
output = merged_df.sort_values(
['combined_index', 'index_x']
).drop(
['index_x', 'index_y', 'combined_index'], axis=1
).reset_index(drop=True)
This results in the following output:
key1 key2 Value1 Value2
0 K a5 apple NaN
1 K a9 NaN apple
2 K a4 guava NaN
3 A1 a7 kiwi kiwi
4 A3 a9 NaN grape
5 A2 a9 grape NaN
6 B1 b2 banana banana
7 C2 c7 NaN guava
8 B9 b8 peach NaN
9 C3 c1 berry orange
How can I merge and maintain the row order of one input?
You can do this with match
and subsetting key
by the result:
bottles <- key[match(samp, key$num),]
# rownames are odd because they must be unique, clean them up
rownames(bottles) <- seq(NROW(bottles))
Merge data.tables while keeping original order in R
Solution using dplyr
:
library(data.table)
set.seed(100)
dt <- data.table(g1=c("A", "B", "C", "D", "E", "F", "L", "O", "P", "J"),
g2=c("G", "D", "C", "H", "K", "J", "L", "U", "I", "R"),
value= rnorm(10))
ids <- data.table(labels=c("A", "B", "C", "D", "E", "F", "L", "O",
"P", "J", "G", "H", "K", "U", "I", "R"),
ids=c(1:16))
dt %>%
left_join(ids, by= c("g1"="labels")) %>%
mutate(label_match = g1 == g2)
Which returns:
g1 g2 value ids label_match
1 A G -0.50219235 1 FALSE
2 B D 0.13153117 2 FALSE
3 C C -0.07891709 3 TRUE
4 D H 0.88678481 4 FALSE
5 E K 0.11697127 5 FALSE
6 F J 0.31863009 6 FALSE
7 L L -0.58179068 7 TRUE
8 O U 0.71453271 8 FALSE
9 P I -0.82525943 9 FALSE
10 J R -0.35986213 10 FALSE
Merge two data frames while keeping a certain row
UPDATE:
In [139]: df[df.ColumnA.isin(df1.ColumnB)].append(df.loc['row_to_keep'])
Out[139]:
ColumnA Stats
0 Cake 872
1 Cheese Cake 912
3 Raspberry Jam 91
4 Bacon 123
row_to_keep NaN 999
Old answer:
Here is one solution:
In [126]: df.merge(df1, left_on="ColumnA", right_on="ColumnB").append(df.loc['row_to_keep'])
Out[126]:
ColumnA Stats ColumnB
0 Cake 872 Cake
1 Cheese Cake 912 Cheese Cake
2 Raspberry Jam 91 Raspberry Jam
3 Bacon 123 Bacon
row_to_keep NaN 999 NaN
Explanation:
df.loc['row_to_keep']
selects one row by index value ('row_to_keep'
) and DF.append(row)
- appends it to the merged DF
I must admit though, there might be less ugly solutions...
Merge data frames while keeping length of one and values of other in R
We can use match
to find the positions of the row names of Y that are found in X. The values of Y are put into a vector and concatenated with 0. We use the nomatch argument to fill in 0 when there is no match. This returns z as a vector:
Z <- c(unlist(Y, use.names=FALSE), 0)[match(row.names(X), row.names(Y), nomatch=4L)]
Z
[1] 0 0 0 20 0 30 0 40 0 0
To get a data.frame
Z <- data.frame(Z)
Match 2 data frames based on common rows, and preserving the order of rownames
With data.table, you can do this:
library(data.table)
setDT(df2)[setDT(df1),,on="b"][is.na(a), a:=0][]
Output:
a b
1: 5 Ccd
2: 9 Kkl
3: 13 Sop
4: 0 Mnn
5: 5 Msg
6: 0 Xxy
7: 0 Zxz
8: 5 Ccd
9: 5 Msg
Or with dplyr
:
library(dplyr)
left_join(df1,df2, by="b") %>% mutate(a=if_else(is.na(a),0,as.double(a)))
Output:
b a
1: Ccd 5
2: Kkl 9
3: Sop 13
4: Mnn 0
5: Msg 5
6: Xxy 0
7: Zxz 0
8: Ccd 5
9: Msg 5
Input:
df1 <- structure(list(b = c("Ccd", "Kkl", "Sop", "Mnn", "Msg", "Xxy",
"Zxz", "Ccd", "Msg")), row.names = c(NA, -9L), class = "data.frame")
df2 <- structure(list(a = c(3L, 5L, 5L, 9L, 5L, 13L, 19L), b = c("Ab",
"Abc", "Ccd", "Kkl", "Msg", "Sop", "Klj")), row.names = c(NA,
-7L), class = "data.frame")
Merge nth elements from two columns while keeping the original row order in R
We could do it with an ifelse
statement checking if row is even or odd with the modulo operator %%
:
library(dplyr)
df %>%
mutate(col3 = ifelse((row_number() %% 2) == 0, col2, col1))
col1 col2 col3
1 A 2 A
2 B 1 1
3 D 2 D
4 F 3 3
5 C 1 C
Related Topics
Read All Worksheets in an Excel Workbook into an R List With Data.Frames
Dummify Character Column and Find Unique Values
Overlay Histogram With Density Curve
Remove Legend Entries For Some Factors Levels
Identify Groups of Linked Episodes Which Chain Together
Assign Multiple Objects to .Globalenv from Within a Function
Coalesce Two String Columns With Alternating Missing Values to One
Remove Columns With Zero Values from a Dataframe
Generate a Sequence of the Last Day of the Month Over Two Years
How to Extract a Single Column from a Data.Frame as a Data.Frame
How to See the Source Code of R .Internal or .Primitive Function
Workflow For Statistical Analysis and Report Writing
Filter Data Frame by Character Column Name (In Dplyr)
A Similar Function to R'S Rep in Matlab
How to Set Multiple Legends/Scales For the Same Aesthetic in Ggplot2