How to do vlookup and fill down (like in Excel) in R?
If I understand your question correctly, here are four methods to do the equivalent of Excel's VLOOKUP
and fill down using R
:
# load sample data from Q
hous <- read.table(header = TRUE,
stringsAsFactors = FALSE,
text="HouseType HouseTypeNo
Semi 1
Single 2
Row 3
Single 2
Apartment 4
Apartment 4
Row 3")
# create a toy large table with a 'HouseType' column
# but no 'HouseTypeNo' column (yet)
largetable <- data.frame(HouseType = as.character(sample(unique(hous$HouseType), 1000, replace = TRUE)), stringsAsFactors = FALSE)
# create a lookup table to get the numbers to fill
# the large table
lookup <- unique(hous)
HouseType HouseTypeNo
1 Semi 1
2 Single 2
3 Row 3
5 Apartment 4
Here are four methods to fill the HouseTypeNo
in the largetable
using the values in the lookup
table:
First with merge
in base:
# 1. using base
base1 <- (merge(lookup, largetable, by = 'HouseType'))
A second method with named vectors in base:
# 2. using base and a named vector
housenames <- as.numeric(1:length(unique(hous$HouseType)))
names(housenames) <- unique(hous$HouseType)
base2 <- data.frame(HouseType = largetable$HouseType,
HouseTypeNo = (housenames[largetable$HouseType]))
Third, using the plyr
package:
# 3. using the plyr package
library(plyr)
plyr1 <- join(largetable, lookup, by = "HouseType")
Fourth, using the sqldf
package
# 4. using the sqldf package
library(sqldf)
sqldf1 <- sqldf("SELECT largetable.HouseType, lookup.HouseTypeNo
FROM largetable
INNER JOIN lookup
ON largetable.HouseType = lookup.HouseType")
If it's possible that some house types in largetable
do not exist in lookup
then a left join would be used:
sqldf("select * from largetable left join lookup using (HouseType)")
Corresponding changes to the other solutions would be needed too.
Is that what you wanted to do? Let me know which method you like and I'll add commentary.
Using R function similar to VLOOKUP in Excel
No need to use dplyr
, simple base-R is enough, with the match
-function.
largedataset$Val <- lookuptable$Val[match(largedataset$Lookup, lookuptable$Lookup)]
If you need to do the lookup more often and have a really large lookuptable, there may be some benefit in using the fastmatch
-package with the similar fmatch
-function, but only if regular match
is too slow.
R equivalent of Excel VLOOKUP
We can use deframe
to create a named vector and remap more efficiently
library(dplyr)
library(tibble)
tibble(
country = c("AE", "AF", "foo")
) %>% mutate(country_remap = deframe(test_mapping)[country])
# A tibble: 3 x 2
# country country_remap
# <chr> <chr>
#1 AE United Arab Emirates
#2 AF Afghanistan
#3 foo <NA>
Using Merge in R to replicate Vlookup
A dplyr solution.
library(dplyr)
dbase <- data.frame(machine_number = c("10","20","30","10","10","50"),
second_attribute=c("a","b","c","c","a","d"))
mbase <- data.frame(machine_number = c("10","20","30","40","50","60","70","80","90","100"),
rate=c(22,22,25,17,15,15,55,12,15,19))
left_join(dbase, mbase, by = "machine_number")
machine_number second_attribute rate
1 10 a 22
2 20 b 22
3 30 c 25
4 10 c 22
5 10 a 22
6 50 d 15
Fill down column based on values in another
You can use ave
to fill na
per group.
df$Tag <- with(df, ave(Tag, ID, FUN=function(x) {
i <- is.na(x)
x[i] <- x[!i][1]
x
}))
df
# ID Tag
#1 999 YES
#2 999 YES
#3 100 NO
#4 100 NO
#5 500 <NA>
Or using fill
library(dplyr)
library(tidyr)
df %>% group_by(ID) %>% fill(Tag, .direction = "downup")
## A tibble: 5 x 2
## Groups: ID [3]
# ID Tag
# <int> <fct>
#1 999 YES
#2 999 YES
#3 100 NO
#4 100 NO
#5 500 NA
Matching values in a column using R - excel vlookup
We can just use match
to find the index of those matching elements of 'y' with that of 'x' and use that to index to get the corresponding 'z'
dt[, Result1 := z[match(y,x)]]
dt
# x y z Result Result1
#1: 1 6 a na NA
#2: 2 5 b e e
#3: 3 4 c d d
#4: 4 3 d c c
#5: 5 2 e b b
#6: 5 1 f a a
VLookup type method in R
merge
them:
> merge(full_list, replace, by.x="Ticker", by.y="Symbol")
Ticker Long_Name Type Location Month
1 AC Ethanol -- CBOT F US U13
2 AIC DJ UBS Commodity Index -- CBOT F US U13
3 BBS South American Soybeans -- CBOT F US U13
4 BO Soybean Oil -- CBT F US V13
5 C Corn -- CBT F US U13
6 DF Dow Jones Industrial Average -- CBT F US U13
VLOOKUP in R Programming Language. Currently doing VLOOKUP in Excel, still cannot get the desired results in R
Here's a demonstration of doing this using dplyr
and magrittr
(for %>%
) packages
library(dplyr)
library(magrittr)
df1 <- data.frame(AccountID = 1:3, AccountName = c("Superman", "Batman", "Joker"))
df2 <- data.frame(AccountID = 1:2, AccountBalance = c(1000, 1000000000))
# joining (or vlookup) to get account balance for all accounts
left_join(df1, df2, by = "AccountID") %>% mutate(Flag = ifelse(is.na(AccountBalance), "N", "Y"))
# AccountID AccountName AccountBalance Flag
# 1 1 Superman 1e+03 Y
# 2 2 Batman 1e+09 Y
# 3 3 Joker NA N
Hope this is what you were looking for. If not, share some sample data and desired output.
How to turn simple Excel LOOKUP into R code?
This should do:
library(tidyverse)
dfmain <- dfmain %>% left_join(dfsub1) %>% left_join(dfsub2)
see here for details on left_join.
Related Topics
I Want to Split Street Address into Two Columns. One With Street Number Other With Street Name
How to Combine Multiple Variable Data to a Single Variable Data
Deleting Rows in R Based on Values Over Multiple Columns
Regex to Replace Comma to Dot Separator
Choose the Top Five Values from Each Group in R
Aggregate/Summarize Multiple Variables Per Group (E.G. Sum, Mean)
R - Getting Characters After Symbol
Calculate Row Means on Subset of Columns
How to Change the Default Colors in Plotly Chart
Removing All Empty Columns and Rows in Data.Frame When Rows Don't Go Away
Add Column Values Based on Other Columns in Data Frame Using for and If
Create Counter Within Consecutive Runs of Values
R: Error in Usemethod("Group_By_"):Applied to an Object of Class
Mapping Columns/Rows from One Dataframe to Another Based on Row Number
Using Ggplot2, How to Insert a Break in the Axis
Axis Labels on Two Lines With Nested X Variables (Year Below Months)
How to Convert Only Some Positive Numbers to Negative Numbers (Conditional Recoding)