agrep: only return best match(es)
RecordLinkage package was removed from CRAN, use stringdist instead:
library(stringdist)
ClosestMatch2 = function(string, stringVector){
stringVector[amatch(string, stringVector, maxDist=Inf)]
}
Why does agrep in R not find the best match?
You can try adist (for generalized Levenshtein (edit) distance), with the following result ('height' from example1 best matches with height from example2 etc.):
adist(example1, example2)
[,1] [,2]
[1,] 0 1
[2,] 1 0
example2[apply(adist(example1, example2), 1, which.min)]
# [1] "height" "weight"
agrep function of R is not working for text matching
It is because of your max.distance
parameter. see ?agrep
.
for instance:
agrep("ms sharda stone crusher prop rupa devi",x,ignore.case=T,value=T,max.distance = 0.2, useBytes = FALSE)
"sharda stone crusher prop rupa"
agrep("ms sharda stone crusher prop rupa devi",x,ignore.case=T,value=T,max.distance = 0.25, useBytes = FALSE)
"sharda stone crusher prop roopa" "sharda stone crusher prop rupa"
agrep("ms sharda stone crusher prop rupa devi",x,ignore.case=T,value=T,max.distance = 9, useBytes = FALSE)
"sharda stone crusher prop rupa"
agrep("ms sharda stone crusher prop rupa devi",x,ignore.case=T,value=T,max.distance = 10, useBytes = FALSE)
"sharda stone crusher prop roopa" "sharda stone crusher prop rupa"
If you want only the closest match see:
best match
Use agrep to return a different variable
How about
personalfolders$DOBMatch <- lapply(personalfolders$DOB, function(y) allees2$PartPathMatch1[agrep(y, allees2$`Date Of Birth`, max.distance=1)])
agrep string matching in R
I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps
stringMatch<-function(search.string,inputstring,pattern=" "){
stringsplit<-unlist(str_split(search.string,pattern))
firstletter<-c()
for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter,
substring(stringsplit[i],1,1),sep="")}
search.string.l<-tolower(search.string)
firstletter.l<-tolower(firstletter)
matchstring<-grep(paste("\\b",search.string.l,"\\b","|","\\b",firstletter.l,"\\b"
,sep=""),tolower(inputstring))
return(matchstring)
}
test1<-c('hello p','helbbo','hello test','HP')
search.string<-'HP'
[1] 4
Related Topics
R: += (Plus Equals) and ++ (Plus Plus) Equivalent from C++/C#/Java, etc.
Ggplot2 Heatmaps: Using Different Gradients for Categories
How to Select Last N Observation from Each Group in Dplyr Dataframe
Identify All Objects of Given Class for Further Processing
Convert All Data Frame Character Columns to Factors
Extract Prediction Band from Lme Fit
How to Fix Corrupted Dates in R
What Are the Differences Between Community Detection Algorithms in Igraph
Find K Nearest Neighbors, Starting from a Distance Matrix
Find Start and End Positions/Indices of Runs/Consecutive Values
Remove Duplicate Column Pairs, Sort Rows Based on 2 Columns
What's the Difference Between Integer Class and Numeric Class in R
R Keep Rows with at Least One Column Greater Than Value
Long Numbers as a Character String
Plot One Numeric Variable Against N Numeric Variables in N Plots