Convert Written Number to Number in R

Convert written number to number in R

Here's a start that should get you to hundreds of thousands.

word2num <- function(word){
wsplit <- strsplit(tolower(word)," ")[[1]]
one_digits <- list(zero=0, one=1, two=2, three=3, four=4, five=5,
six=6, seven=7, eight=8, nine=9)
teens <- list(eleven=11, twelve=12, thirteen=13, fourteen=14, fifteen=15,
sixteen=16, seventeen=17, eighteen=18, nineteen=19)
ten_digits <- list(ten=10, twenty=20, thirty=30, forty=40, fifty=50,
sixty=60, seventy=70, eighty=80, ninety=90)
doubles <- c(teens,ten_digits)
out <- 0
i <- 1
while(i <= length(wsplit)){
j <- 1
if(i==1 && wsplit[i]=="hundred")
temp <- 100
else if(i==1 && wsplit[i]=="thousand")
temp <- 1000
else if(wsplit[i] %in% names(one_digits))
temp <- as.numeric(one_digits[wsplit[i]])
else if(wsplit[i] %in% names(teens))
temp <- as.numeric(teens[wsplit[i]])
else if(wsplit[i] %in% names(ten_digits))
temp <- (as.numeric(ten_digits[wsplit[i]]))
if(i < length(wsplit) && wsplit[i+1]=="hundred"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 100*temp
else
out <- 100*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1]=="thousand"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 1000*temp
else
out <- 1000*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){
temp <- temp*100
out <- out + temp
}
else{
out <- out + temp
}
i <- i + j
}
return(list(word,out))
}

Results:

> word2num("fifty seven")
[[1]]
[1] "fifty seven"

[[2]]
[1] 57

> word2num("four fifty seven")
[[1]]
[1] "four fifty seven"

[[2]]
[1] 457

> word2num("six thousand four fifty seven")
[[1]]
[1] "six thousand four fifty seven"

[[2]]
[1] 6457

> word2num("forty six thousand four fifty seven")
[[1]]
[1] "forty six thousand four fifty seven"

[[2]]
[1] 46457

> word2num("forty six thousand four hundred fifty seven")
[[1]]
[1] "forty six thousand four hundred fifty seven"

[[2]]
[1] 46457

> word2num("three forty six thousand four hundred fifty seven")
[[1]]
[1] "three forty six thousand four hundred fifty seven"

[[2]]
[1] 346457

I can tell you already that this won't work for word2num("four hundred thousand fifty"), because it doesn't know how to handle consecutive "hundred" and "thousand" terms, but the algorithm can be modified probably. Anyone should feel free to edit this if they have improvements or build on them in their own answer. I just thought this was a fun problem to play with (for a little while).

Edit: Apparently Bill Venables has a package called english that may achieve this even better than the above code.

A function to convert words to numbers

The function works like this (note you also need the stringr package).

  1. First, it takes the word you input (i.e. "five" if you used words_to_numbers("five"))

  2. Then, str_to_lower() takes that and normalizes it to all lower case (i.e., avoiding issues if you typed "Five" or "FIVE" instead of "five").

  3. It then iterates over a loop (for some reason ending at 11), so i will take the value of 1, then 2, then 3, all the way to 11.

  4. Within the loop, str_replace_all() takes your string (i.e., "five") and looks for a matching pattern. Here, the pattern is words(i) (i.e. words(5) when i == 5 yields the pattern "five" - in the english package, the words() function provides a vector of words that represent the position in the vector. For instance, if you type english::words(1000) it will return "one thousand". Once it finds the pattern, it then replaces it with as.character(i). The as.character() function converts the number i value to a character since str_replace_all() requires a character replacement. If you needed the return value to be numeric, you could use as.numeric(words_to_numbers("five"))

For some reason, the function stops at 11, meaning if you type words_to_numbers("twelve") it won't work (returns "twelve"). So you will need to adjust that number if you want to use the function for values > 11.

Hope this helps and good luck learning R!

Convert integer to words

Option 1:

Use the as.english function from the 'english' package:

library(english)

as.english(2017)



Option 2:

Use the replace_number function from the 'qdap' package.

library(qdap)

replace_number(2017)



Option 3:

Use the numbers_to_words function from the 'xfun' package.

library(xfun)

numbers_to_words(2017)

Convert letters to numbers

thanks for all the ideas, but I am a dumdum.

Here's what I did. Made a mapping from each letter to a specific number, then called each letter

df=data.frame(L=letters[1:26],N=rnorm(26))
df[df$L=='e',2]

Numeric to Alphabetic Lettering Function in R

Here are some alternatives:

1) encode Let b be the base. Here b = 26. Then there are b^k appendices having k letters
so for a particular appendix having number x it has n letters if n is the
smallest integer for which b + b^2 + ... + b^n >= x. The LHS of this inequality is a geometric series and therefore has a closed form solution. Replacing the LHS with that expression and solving the resulting equation for n gives the formula for n in the code below. Then we subtract all b^k terms from number for which k < n and use the APL-like encode function found here (and elsewhere on the web). encode does the base conversion giving digits, a vector of digits in base base. Finally add 1 to each digit and use that as a lookup into LETTERS.

app2 <- function(number, base = 26) {
n <- ceiling(log((1/(1 - base) - 1 - number) * (1 - base), base = base)) - 1
digits <- encode(number - sum(base^seq(0, n-1)), rep(base, n))
paste(LETTERS[digits + 1], collapse = "")
}

sapply(1:29, app2) # test

giving:

[1] "A"  "B"  "C"  "D"  "E"  "F"  "G"  "H"  "I"  "J"  "K"  "L"  "M"  "N"  "O" 
[16] "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "AA" "AB" "AC"

Another test to try is:

sapply(1:60, app2, base = 3)

2) recursive solution Here is an alternative that works recursively. It computes the last letter of the Appendix number and then removes it and recursively computes the portion to its left.

app2r <- function(number, base = 26, suffix = "") {
number1 <- number - 1
last_digit <- number1 %% base
rest <- number1 %/% base
suffix <- paste0(LETTERS[last_digit + 1], suffix)
if (rest > 0) Recall(rest, base, suffix) else suffix
}

# tests
identical(sapply(1:29, app2r), sapply(1:29, app2))
## [1] TRUE
identical(sapply(1:60, app2r, base = 3), sapply(1:60, app2, base = 3))
## [1] TRUE


Related Topics



Leave a reply



Submit