Duplicating Rows in R Merge Function

Duplicating rows in R merge function

The problem you have is that your variables are not unique. If you merge them you will get more and more rows. You should have a look what you get when you do:

dt <- data.frame(level12R$level1.2_are.out$parameters$stdyx.standardized[,1:2])

paramHeader param
34 Intercepts ASRREA
35 Residual.Variances ASRREA

You can see that the last to variables are the same, but come from different headers.

So we have to extend the join so we can make unique records. Looking at the data that would take 3 columns, 1, 2 and 8 "header", "variable" and "betweenwithin". Then we can loop through everything without getting duplicate records. Your dt object ends up with 35 records and 51 variables with NA's where the results were not 35 records but 34 or even 25.

nomes <- '0'
dt <- data.frame(Level12R$level1.2_are.out$parameters$stdyx.standardized[,c(1:2, 8)])
names(dt)<-c("header", "variable", "betweenwithin")
for(i in 1:length(Level12R)) {
nomes[i] = names(Level12R)[i]
df = eval(parse(text=paste0("Level12R$",nomes[i],"$parameters$stdyx.standardized", collapse=NULL)))
df <- df[,c(1:3, 8)]
names(df)<-c("header", "variable", toupper(substr(nomes[i],10,12)), "betweenwithin")
dt <- left_join(x=dt, y=df)

Normally I would use a list object in a loop, and later on see what I need to do with the data in the list. It prevents creating unintended side effects when using joins / merges etc.

Why using merge function in R creates duplicates?

We can get only the unique rows of DF1 and DF2 and then merge.

DF <- merge(unique(DF1), unique(DF2), by = c("Date", "Time"), all.x= TRUE) 

merge values from one dataframe onto another without creating duplicates in R

If df2 has duplicates we can use unique the get rid of them. I.e.

df2_clean <- unique(df2)

df1_and_df2 <- df1 %>% left_join(df2_clean)

Explanation for what caused the original problem:

If we join two data.sets x and y where the common column is not unique in both of them, the join will combine each observation in x with each observation in y leading to many duplicated rows

How to eliminate duplication row in R when using merge function

Use data.frame(FL_ratio, time).

The merge(...) function is not meant for this. Since time and FL_ratio are vectors, merge(FL_ratio, time) will produce a cross-product: for each element of FL_ratio there will be rows for all the values of time. This is why you're getting 10,816 rows. You can see this below:

x <- 1:3
y <- 4:6
## x y
## 1 1 4
## 2 2 4
## 3 3 4
## 4 1 5
## 5 2 5
## 6 3 5
## 7 1 6
## 8 2 6
## 9 3 6

## x y
## 1 1 4
## 2 2 5
## 3 3 6

Related Topics

Leave a reply