Convert Spelled Out Number to Number

Convert spelled out number to number

NSNumberFormatter can convert from text to numbers:

NSNumberFormatter *formatter = [[NSNumberFormatter alloc] init];
formatter.numberStyle = NSNumberFormatterSpellOutStyle;

NSLog(@"%@", [formatter numberFromString:@"thirty-four"]);
NSLog(@"%@", [formatter numberFromString:@"three point five"]);

formatter.locale = [[NSLocale alloc]initWithLocaleIdentifier:[NSLocale localeIdentifierFromComponents:@{NSLocaleLanguageCode: @"es"}]];

NSLog(@"%@", [formatter numberFromString:@"ocho"]);

There are serious limitations as to what it can handle (it doesn't auto detect languages, if you deviate from the expected format (e.g. "thirty four" instead of "thirty-four"), fractions, etc.), but for the narrow domain, it appears to do the job.

Convert written number to number in R

Here's a start that should get you to hundreds of thousands.

word2num <- function(word){
wsplit <- strsplit(tolower(word)," ")[[1]]
one_digits <- list(zero=0, one=1, two=2, three=3, four=4, five=5,
six=6, seven=7, eight=8, nine=9)
teens <- list(eleven=11, twelve=12, thirteen=13, fourteen=14, fifteen=15,
sixteen=16, seventeen=17, eighteen=18, nineteen=19)
ten_digits <- list(ten=10, twenty=20, thirty=30, forty=40, fifty=50,
sixty=60, seventy=70, eighty=80, ninety=90)
doubles <- c(teens,ten_digits)
out <- 0
i <- 1
while(i <= length(wsplit)){
j <- 1
if(i==1 && wsplit[i]=="hundred")
temp <- 100
else if(i==1 && wsplit[i]=="thousand")
temp <- 1000
else if(wsplit[i] %in% names(one_digits))
temp <- as.numeric(one_digits[wsplit[i]])
else if(wsplit[i] %in% names(teens))
temp <- as.numeric(teens[wsplit[i]])
else if(wsplit[i] %in% names(ten_digits))
temp <- (as.numeric(ten_digits[wsplit[i]]))
if(i < length(wsplit) && wsplit[i+1]=="hundred"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 100*temp
else
out <- 100*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1]=="thousand"){
if(i>1 && wsplit[i-1] %in% c("hundred","thousand"))
out <- out + 1000*temp
else
out <- 1000*(out + temp)
j <- 2
}
else if(i < length(wsplit) && wsplit[i+1] %in% names(doubles)){
temp <- temp*100
out <- out + temp
}
else{
out <- out + temp
}
i <- i + j
}
return(list(word,out))
}

Results:

> word2num("fifty seven")
[[1]]
[1] "fifty seven"

[[2]]
[1] 57

> word2num("four fifty seven")
[[1]]
[1] "four fifty seven"

[[2]]
[1] 457

> word2num("six thousand four fifty seven")
[[1]]
[1] "six thousand four fifty seven"

[[2]]
[1] 6457

> word2num("forty six thousand four fifty seven")
[[1]]
[1] "forty six thousand four fifty seven"

[[2]]
[1] 46457

> word2num("forty six thousand four hundred fifty seven")
[[1]]
[1] "forty six thousand four hundred fifty seven"

[[2]]
[1] 46457

> word2num("three forty six thousand four hundred fifty seven")
[[1]]
[1] "three forty six thousand four hundred fifty seven"

[[2]]
[1] 346457

I can tell you already that this won't work for word2num("four hundred thousand fifty"), because it doesn't know how to handle consecutive "hundred" and "thousand" terms, but the algorithm can be modified probably. Anyone should feel free to edit this if they have improvements or build on them in their own answer. I just thought this was a fun problem to play with (for a little while).

Edit: Apparently Bill Venables has a package called english that may achieve this even better than the above code.

Convert integers to written numbers

This should work reasonably well:

public static class HumanFriendlyInteger
{
static string[] ones = new string[] { "", "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine" };
static string[] teens = new string[] { "Ten", "Eleven", "Twelve", "Thirteen", "Fourteen", "Fifteen", "Sixteen", "Seventeen", "Eighteen", "Nineteen" };
static string[] tens = new string[] { "Twenty", "Thirty", "Forty", "Fifty", "Sixty", "Seventy", "Eighty", "Ninety" };
static string[] thousandsGroups = { "", " Thousand", " Million", " Billion" };

private static string FriendlyInteger(int n, string leftDigits, int thousands)
{
if (n == 0)
{
return leftDigits;
}

string friendlyInt = leftDigits;

if (friendlyInt.Length > 0)
{
friendlyInt += " ";
}

if (n < 10)
{
friendlyInt += ones[n];
}
else if (n < 20)
{
friendlyInt += teens[n - 10];
}
else if (n < 100)
{
friendlyInt += FriendlyInteger(n % 10, tens[n / 10 - 2], 0);
}
else if (n < 1000)
{
friendlyInt += FriendlyInteger(n % 100, (ones[n / 100] + " Hundred"), 0);
}
else
{
friendlyInt += FriendlyInteger(n % 1000, FriendlyInteger(n / 1000, "", thousands+1), 0);
if (n % 1000 == 0)
{
return friendlyInt;
}
}

return friendlyInt + thousandsGroups[thousands];
}

public static string IntegerToWritten(int n)
{
if (n == 0)
{
return "Zero";
}
else if (n < 0)
{
return "Negative " + IntegerToWritten(-n);
}

return FriendlyInteger(n, "", 0);
}
}

(Edited to fix a bug w/ million, billion, etc.)

How to convert numeric words into numeric in python

For numbers to words, try "num2words" package:
https://pypi.python.org/pypi/num2words

For words to num, I tweaked the code slightly from the code here:
Is there a way to convert number words to Integers?

from num2words import num2words

def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]

tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

scales = ["hundred", "thousand", "million", "billion", "trillion"]

numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

current = result = 0
for word in textnum.split():
if word not in numwords:
raise Exception("Illegal word: " + word)

scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0

return result + current

#### My update to incorporate decimals
num = 5000222223.28
fullText = num2words(num).replace('-',' ').replace(',',' ')
print fullText

decimalSplit = fullText.split('point ')

if len(decimalSplit) > 1:
decimalSplit2 = decimalSplit[1].split(' ')
decPart = sum([float(text2int(decimalSplit2[x]))/(10)**(x+1) for x in range(len(decimalSplit2))])
else:
decPart = 0

intPart = float(text2int(decimalSplit[0]))

Value = intPart + decPart

print Value

-> five billion two hundred and twenty two thousand two hundred and twenty three point two eight

-> 5000222223.28

Spelled-out number (string) to integer using ICU

After getting help from the comments here is my solution, in case anybody else needs it:

private val String.numberFromSpelledOut: Boolean
get() {
val esFormatter = RuleBasedNumberFormat(Locale.ENGLISH, RuleBasedNumberFormat.SPELLOUT)
return try {
return esFormatter.parse(this)
} catch (e: ParseException) {
""
}
}

This should not throw an exception for invalid spelled-out inputs.

Python: convert alphabetically spelled out numbers to numerics?

I wrote some code to do this for integers a while ago: http://github.com/ghewgill/text2num

Feel free to fork and hack.

convert spelled out numerals ( twenty two ) to digits ( 22 )

Fortunately, english numbering is resonably consistent, apart from the tens. You would need to define the strings for 'one' to 'nineteen', all the tens from 'twenty' to 'ninety', 'hundred', 'thousand', 'million', etc.

Then, for the parser, parse the string to tokens (basically split it). If you like, you can convert the tokens to numbers, or you can leave them as strings if you like. I use numbers here, because it's a bit shorter. :)

So, if you have a string of "four thousand hundred and eighty five", you could parse this to the tokens:

4 1000 100 80 5

(you can discard the word 'and', as well as commas and other noise.)

Then, you can process this string from the back in the following very simplified pseudo code.

Total = 0;
Multiplicator = 1;

repeat
Number = ReadNumber || 1;
Total += Multiplicator * Number;
Multiplicator = ReadMultiplicator;

The multiplicator are 10, 20, 30 .. 100, 1000, 1000000
The numbers are 1 .. 19. These are optional. If there is no number found, treat it as if one was read.

So in the end, you should treat the string as

4 1000 1 100 1 80 5

Which should be processed as

5 + 1 * 80 + 1 * 100 + 4 * 1000

I said the pseudo code is very simplified. First of all, your parsed needs to be able to 'peek' ahead. If the preceding token isn't a number, it must be a multiplicator, so it should not be skipped.
Furthermore, you want some extra checks. After all, numbers until 100 are special. You wouldn't want to parse the string "three eighty four" to 244.

And then you need some extra layer. What about "eighty five thousand"? It looks like you shouldn't read just one number before a multiplicator, but actually all tokens that are smaller than the multiplicator, and parse them together, so

one hundred sixty thousand

Should be parsed as

(1 100 1 60) 1000

You might add extra checks in that piece later, although you can do without at first. After all, "thousand million" is actually a "billion", but it will parse correctly and even give the right outcome.

So, I think parsing correct strings is quite doable. Checking for exceptions makes it a bit harder. "twenty three hundred" will work out of the box, but it's weird that you could also write "ten thousand twenty three hundred" and actually get an outcome of 12.300. But then again, checks like that are not strictly needed at first, I guess.

Numeric words in a string into numbers

I have tried to port a text2num Python library to PHP, mix it with a regex for matching English spelled out numbers, enhanced it to the decillion, and here is a result:

function text2num($s) {
// Enhanced the regex at http://www.rexegg.com/regex-trick-numbers-in-english.html#english-number-regex
$reg = <<<REGEX
(?x) # free-spacing mode
(?(DEFINE)
# Within this DEFINE block, we'll define many subroutines
# They build on each other like lego until we can define
# a "big number"

(?<one_to_9>
# The basic regex:
# one|two|three|four|five|six|seven|eight|nine
# We'll use an optimized version:
# Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)|
# s(?:ix|even)
# Option 2:
(?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight)
) # end one_to_9 definition

(?<ten_to_19>
# The basic regex:
# ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
# eighteen|nineteen
# We'll use an optimized version:
# Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r|
# s(?:ix|even))tee)n
# Option 2:
(?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en|
t(?:(?:hirte)?en|welve))
) # end ten_to_19 definition

(?<two_digit_prefix>
# The basic regex:
# twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety
# We'll use an optimized version:
# Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty
# Option 2:
(?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty
) # end two_digit_prefix definition

(?<one_to_99>
(?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)|
(?&one_to_9)
) # end one_to_99 definition

(?<one_to_999>
(?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?|
(?&one_to_99)
) # end one_to_999 definition

(?<one_to_999_999>
(?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?|
(?&one_to_999)
) # end one_to_999_999 definition

(?<one_to_999_999_999>
(?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?|
(?&one_to_999_999)
) # end one_to_999_999_999 definition

(?<one_to_999_999_999_999>
(?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?|
(?&one_to_999_999_999)
) # end one_to_999_999_999_999 definition

(?<one_to_999_999_999_999_999>
(?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?|
(?&one_to_999_999_999_999)
) # end one_to_999_999_999_999_999 definition
# ==== MORE ====
(?<one_to_quadrillion>
(?&one_to_999)[ ]quadrillion(?:[ ](?&one_to_999_999_999_999_999))?|
(?&one_to_999_999_999_999_999)
) # end one_to_quadrillion definition
(?<one_to_quintillion>
(?&one_to_999)[ ]quintillion(?:[ ](?&one_to_quadrillion))?|
(?&one_to_quadrillion)
) # end one_to_quintillion definition
(?<one_to_sextillion>
(?&one_to_999)[ ]sextillion(?:[ ](?&one_to_quintillion))?|
(?&one_to_quintillion)
) # end one_to_sextillion definition
(?<one_to_septillion>
(?&one_to_999)[ ]septillion(?:[ ](?&one_to_sextillion))?|
(?&one_to_sextillion)
) # end one_to_septillion definition
(?<one_to_octillion>
(?&one_to_999)[ ]octillion(?:[ ](?&one_to_septillion))?|
(?&one_to_septillion)
) # end one_to_octillion definition
(?<one_to_nonillion>
(?&one_to_999)[ ]nonillion(?:[ ](?&one_to_octillion))?|
(?&one_to_octillion)
) # end one_to_nonillion definition
(?<one_to_decillion>
(?&one_to_999)[ ]decillion(?:[ ](?&one_to_nonillion))?|
(?&one_to_nonillion)
) # end one_to_decillion definition

(?<bignumber>
zero|(?&one_to_decillion)
) # end bignumber definition

(?<zero_to_9>
(?&one_to_9)|zero
) # end zero to 9 definition

# (?<decimals>
# point(?:[ ](?&zero_to_9))+
# ) # end decimals definition

) # End DEFINE

####### The Regex Matching Starts Here ########
\b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
REGEX;
return preg_replace_callback('~' . trim($reg) . '~i', function ($x) {
return text2num_internal($x[0]);
}, $s);
}
function text2num_internal($s) {
// Port of https://github.com/ghewgill/text2num/blob/master/text2num.py
$Small = [
'zero'=> 0,
'one'=> 1,
'two'=> 2,
'three'=> 3,
'four'=> 4,
'five'=> 5,
'six'=> 6,
'seven'=> 7,
'eight'=> 8,
'nine'=> 9,
'ten'=> 10,
'eleven'=> 11,
'twelve'=> 12,
'thirteen'=> 13,
'fourteen'=> 14,
'fifteen'=> 15,
'sixteen'=> 16,
'seventeen'=> 17,
'eighteen'=> 18,
'nineteen'=> 19,
'twenty'=> 20,
'thirty'=> 30,
'forty'=> 40,
'fifty'=> 50,
'sixty'=> 60,
'seventy'=> 70,
'eighty'=> 80,
'ninety'=> 90
];

$Magnitude = [
'thousand'=> 1000,
'million'=> 1000000,
'billion'=> 1000000000,
'trillion'=> 1000000000000,
'quadrillion'=> 1000000000000000,
'quintillion'=> 1000000000000000000,
'sextillion'=> 1000000000000000000000,
'septillion'=> 1000000000000000000000000,
'octillion'=> 1000000000000000000000000000,
'nonillion'=> 1000000000000000000000000000000,
'decillion'=> 1000000000000000000000000000000000,
];

$a = preg_split("~[\s-]+(?:and[\s-]+)?~u", $s);
$a = array_map('strtolower', $a);
$n = 0;
$g = 0;
foreach ($a as $w) {
if (isset($Small[$w])) {
$g = $g + $Small[$w];
}
else if ($w == "hundred" && $g != 0) {
$g = $g * 100;
}
else {
$x = $Magnitude[$w];
if (strlen($x) > 0) {
$n =$n + $g * $x;
$g = 0;
}
else{
throw new Exception("Unknown number: " . $w);
}
}
}
return $n + $g;
}

echo text2num("one") . "\n"; // 1
echo text2num("twelve") . "\n"; // 12
echo text2num("seventy two") . "\n"; // 72
echo text2num("three hundred") . "\n"; // 300
echo text2num("twelve hundred") . "\n"; // 1200
echo text2num("twelve thousand three hundred four") . "\n"; // 12304
echo text2num("six million") . "\n"; // 6000000
echo text2num("six million four hundred thousand five") . "\n"; // 6400005
echo text2num("one hundred twenty three billion four hundred fifty six million seven hundred eighty nine thousand twelve") . "\n"; # // 123456789012
echo text2num("four decillion") . "\n"; // 4000000000000000000000000000000000
echo text2num("five hundred and thirty-seven") . "\n"; // 537
echo text2num("five hundred and thirty seven") . "\n"; // 537

See the PHP demo.

The regex can actually match either just big numbers or numbers like "eleven hundred", see \b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b. It can be further enhanced. E.g. word boundaries may be replaced with other boundary types (like (?<!\S) and (?!\S) to match in between whitespaces, etc.).

Decimal part in the regex is commented out since even if we match it, the num2text won't handle them.



Related Topics



Leave a reply



Submit