How to Convert Words to a Number

Converting words to numbers in Java

I hope below code will do the job in most of the cases. However some modification might be required as I've not tested properly yet.

Assumption:

  1. Positive, negative, plus, minus is not allowed.
  2. Lac, crore is not allowed.
  3. Only English language is supported.

If you need to support first two points, you can very easily do that.

    boolean isValidInput = true;
long result = 0;
long finalResult = 0;
List<String> allowedStrings = Arrays.asList
(
"zero","one","two","three","four","five","six","seven",
"eight","nine","ten","eleven","twelve","thirteen","fourteen",
"fifteen","sixteen","seventeen","eighteen","nineteen","twenty",
"thirty","forty","fifty","sixty","seventy","eighty","ninety",
"hundred","thousand","million","billion","trillion"
);

String input="One hundred two thousand and thirty four";

if(input != null && input.length()> 0)
{
input = input.replaceAll("-", " ");
input = input.toLowerCase().replaceAll(" and", " ");
String[] splittedParts = input.trim().split("\\s+");

for(String str : splittedParts)
{
if(!allowedStrings.contains(str))
{
isValidInput = false;
System.out.println("Invalid word found : "+str);
break;
}
}
if(isValidInput)
{
for(String str : splittedParts)
{
if(str.equalsIgnoreCase("zero")) {
result += 0;
}
else if(str.equalsIgnoreCase("one")) {
result += 1;
}
else if(str.equalsIgnoreCase("two")) {
result += 2;
}
else if(str.equalsIgnoreCase("three")) {
result += 3;
}
else if(str.equalsIgnoreCase("four")) {
result += 4;
}
else if(str.equalsIgnoreCase("five")) {
result += 5;
}
else if(str.equalsIgnoreCase("six")) {
result += 6;
}
else if(str.equalsIgnoreCase("seven")) {
result += 7;
}
else if(str.equalsIgnoreCase("eight")) {
result += 8;
}
else if(str.equalsIgnoreCase("nine")) {
result += 9;
}
else if(str.equalsIgnoreCase("ten")) {
result += 10;
}
else if(str.equalsIgnoreCase("eleven")) {
result += 11;
}
else if(str.equalsIgnoreCase("twelve")) {
result += 12;
}
else if(str.equalsIgnoreCase("thirteen")) {
result += 13;
}
else if(str.equalsIgnoreCase("fourteen")) {
result += 14;
}
else if(str.equalsIgnoreCase("fifteen")) {
result += 15;
}
else if(str.equalsIgnoreCase("sixteen")) {
result += 16;
}
else if(str.equalsIgnoreCase("seventeen")) {
result += 17;
}
else if(str.equalsIgnoreCase("eighteen")) {
result += 18;
}
else if(str.equalsIgnoreCase("nineteen")) {
result += 19;
}
else if(str.equalsIgnoreCase("twenty")) {
result += 20;
}
else if(str.equalsIgnoreCase("thirty")) {
result += 30;
}
else if(str.equalsIgnoreCase("forty")) {
result += 40;
}
else if(str.equalsIgnoreCase("fifty")) {
result += 50;
}
else if(str.equalsIgnoreCase("sixty")) {
result += 60;
}
else if(str.equalsIgnoreCase("seventy")) {
result += 70;
}
else if(str.equalsIgnoreCase("eighty")) {
result += 80;
}
else if(str.equalsIgnoreCase("ninety")) {
result += 90;
}
else if(str.equalsIgnoreCase("hundred")) {
result *= 100;
}
else if(str.equalsIgnoreCase("thousand")) {
result *= 1000;
finalResult += result;
result=0;
}
else if(str.equalsIgnoreCase("million")) {
result *= 1000000;
finalResult += result;
result=0;
}
else if(str.equalsIgnoreCase("billion")) {
result *= 1000000000;
finalResult += result;
result=0;
}
else if(str.equalsIgnoreCase("trillion")) {
result *= 1000000000000L;
finalResult += result;
result=0;
}
}

finalResult += result;
result=0;
System.out.println(finalResult);
}
}

A function to convert words to numbers

The function works like this (note you also need the stringr package).

  1. First, it takes the word you input (i.e. "five" if you used words_to_numbers("five"))

  2. Then, str_to_lower() takes that and normalizes it to all lower case (i.e., avoiding issues if you typed "Five" or "FIVE" instead of "five").

  3. It then iterates over a loop (for some reason ending at 11), so i will take the value of 1, then 2, then 3, all the way to 11.

  4. Within the loop, str_replace_all() takes your string (i.e., "five") and looks for a matching pattern. Here, the pattern is words(i) (i.e. words(5) when i == 5 yields the pattern "five" - in the english package, the words() function provides a vector of words that represent the position in the vector. For instance, if you type english::words(1000) it will return "one thousand". Once it finds the pattern, it then replaces it with as.character(i). The as.character() function converts the number i value to a character since str_replace_all() requires a character replacement. If you needed the return value to be numeric, you could use as.numeric(words_to_numbers("five"))

For some reason, the function stops at 11, meaning if you type words_to_numbers("twelve") it won't work (returns "twelve"). So you will need to adjust that number if you want to use the function for values > 11.

Hope this helps and good luck learning R!

Converting words to numbers

Here, i did it in python, it will help you or someone else from algorithmic perspective.

#!/usr/bin/python

__author__ = 'tomcat'

all = {
"one" : 1,
"two" : 2,
"three" : 3,
"four" : 4,
"five" : 5,
"six" : 6,
"seven" : 7,
"eight" : 8,
"nine" : 9,
"ten" : 10,
"eleven": 11,
"twelve": 12,
"thirteen": 13,
"fourteen": 14,
"fifteen": 15,
"sixteen": 16,
"seventeen": 17,
"eighteen": 18,
"nineteen": 19,
"twenty" : 20,
"thirty" : 30,
"forty" : 40,
"fifty" : 50,
"sixty" : 60,
"seventy" : 70,
"eighty" : 80,
"ninety" : 90,
"hundred" : 100,
"thousand" : 1000,
"million" : 1000000,
"billion" : 1000000000,
"trillion" : 1000000000000,
"quadrillion" : 1000000000000000,
"quintillion" : 1000000000000000000,
"sextillion" : 1000000000000000000000,
"septillion" : 1000000000000000000000000,
"octillion" : 1000000000000000000000000000,
"nonillion" : 1000000000000000000000000000000
};

spliter = {
"thousand" : 1000,
"million" : 1000000,
"billion" : 1000000000,
"trillion" : 1000000000000,
"quadrillion" : 1000000000000000,
"quintillion" : 1000000000000000000,
"sextillion" : 1000000000000000000000,
"septillion" : 1000000000000000000000000,
"octillion" : 1000000000000000000000000000,
"nonillion" : 1000000000000000000000000000000
};

inputnumber = raw_input("Please enter string number : ");

tokens = inputnumber.split(" ");

result = 0;
partial_result = 0;
for index in range(len(tokens)):
if tokens[index] in spliter :
if partial_result == 0:
partial_result = 1;
partial_result *= all[tokens[index]];
result += partial_result;
partial_result = 0;
else:
if tokens[index] == "hundred" :
if partial_result == 0:
partial_result = 1;
partial_result *= all[tokens[index]];

else:
partial_result += all[tokens[index]];

result += partial_result;

print result;

How to convert numeric words into numeric in python

For numbers to words, try "num2words" package:
https://pypi.python.org/pypi/num2words

For words to num, I tweaked the code slightly from the code here:
Is there a way to convert number words to Integers?

from num2words import num2words

def text2int(textnum, numwords={}):
if not numwords:
units = [
"zero", "one", "two", "three", "four", "five", "six", "seven", "eight",
"nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen",
"sixteen", "seventeen", "eighteen", "nineteen",
]

tens = ["", "", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"]

scales = ["hundred", "thousand", "million", "billion", "trillion"]

numwords["and"] = (1, 0)
for idx, word in enumerate(units): numwords[word] = (1, idx)
for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

current = result = 0
for word in textnum.split():
if word not in numwords:
raise Exception("Illegal word: " + word)

scale, increment = numwords[word]
current = current * scale + increment
if scale > 100:
result += current
current = 0

return result + current

#### My update to incorporate decimals
num = 5000222223.28
fullText = num2words(num).replace('-',' ').replace(',',' ')
print fullText

decimalSplit = fullText.split('point ')

if len(decimalSplit) > 1:
decimalSplit2 = decimalSplit[1].split(' ')
decPart = sum([float(text2int(decimalSplit2[x]))/(10)**(x+1) for x in range(len(decimalSplit2))])
else:
decPart = 0

intPart = float(text2int(decimalSplit[0]))

Value = intPart + decPart

print Value

-> five billion two hundred and twenty two thousand two hundred and twenty three point two eight

-> 5000222223.28

How would I convert words to numbers in python 3 (own keys and values)?

You will have to handle punctuation but you just need to sum the value of each words letters and group them which you can do with a defaultdict:

lines = """am writing a Python script that will take words in a text file and convert them into numbers (my own, not ASCII, so no ord function).
I have assigned each letter to an integer and would like each word to be the sum of its letters' numerical value.
The goal is to group each word with the same numerical value into a dictionary.
I am having great trouble recombining the split words as numbers and adding them together"""

from collections import defaultdict

d = defaultdict(list)
for line in lines.splitlines():
for word in line.split():
d[sum(l_n.get(ch,0) for ch in word)].append(word)

Output:

from pprint import pprint as pp

pp(dict(d))
{1: ['a', 'a', 'a'],
7: ['be'],
9: ['I', 'I'],
14: ['am', 'am'],
15: ['an'],
17: ['each', 'each', 'each'],
19: ['and', 'and', 'and'],
20: ['as'],
21: ['of'],
23: ['in'],
28: ['is'],
29: ['no'],
32: ['file'],
33: ['the', 'The', 'the', 'the'],
34: ['so'],
35: ['to', 'to', 'goal', 'to'],
36: ['have'],
37: ['take', 'ord', 'like'],
38: ['(my', 'same'],
39: ['adding'],
41: ['ASCII,'],
46: ['them', 'them'],
48: ['its'],
49: ['that', 'not'],
51: ['great'],
52: ['own,'],
53: ['sum'],
56: ['will'],
58: ['into', 'into'],
60: ['word', 'word', 'with'],
61: ['value.', 'value', 'having'],
69: ['text'],
75: ['would'],
76: ['split'],
77: ['group'],
78: ['assigned', 'integer'],
79: ['words', 'words'],
80: ['letter'],
85: ['script'],
92: ['numbers', 'numbers'],
93: ['trouble'],
96: ['numerical', 'numerical'],
97: ['convert'],
98: ['Python', 'together'],
99: ["letters'"],
100: ['writing'],
102: ['function).'],
109: ['recombining'],
118: ['dictionary.']}

sum(l_n.get(ch,0) for ch in word) gets the sum of all the letters in the word, we use that as the key and just append the word as the value. The defaultdict handles repeated keys so we end you with all the words that have the same sum grouped in lists.

Also as John commented you can simply store a set of lowercase letters in the dict and call .lower sum(l_n.get(ch,0) for ch in word.lower())

If you want to remove all punctuation you can use str.translate:

from collections import defaultdict
from string import punctuation
d = defaultdict(list)
for line in lines.splitlines():
for word in line.split():
word = word.translate(None,punctuation)
d[sum(l_n.get(ch,0) for ch in word)].append(word)

Which would output:

{1: ['a', 'a', 'a'],
7: ['be'],
9: ['I', 'I'],
14: ['am', 'am'],
15: ['an'],
17: ['each', 'each', 'each'],
19: ['and', 'and', 'and'],
20: ['as'],
21: ['of'],
23: ['in'],
28: ['is'],
29: ['no'],
32: ['file'],
33: ['the', 'The', 'the', 'the'],
34: ['so'],
35: ['to', 'to', 'goal', 'to'],
36: ['have'],
37: ['take', 'ord', 'like'],
38: ['my', 'same'],
39: ['adding'],
41: ['ASCII'],
46: ['them', 'them'],
48: ['its'],
49: ['that', 'not'],
51: ['great'],
52: ['own'],
53: ['sum'],
56: ['will'],
58: ['into', 'into'],
60: ['word', 'word', 'with'],
61: ['value', 'value', 'having'],
69: ['text'],
75: ['would'],
76: ['split'],
77: ['group'],
78: ['assigned', 'integer'],
79: ['words', 'words'],
80: ['letter'],
85: ['script'],
92: ['numbers', 'numbers'],
93: ['trouble'],
96: ['numerical', 'numerical'],
97: ['convert'],
98: ['Python', 'together'],
99: ['letters'],
100: ['writing'],
102: ['function'],
109: ['recombining'],
118: ['dictionary']}

If you don't want duplicate words appearing then use a set:

d = defaultdict(set)
....
d[sum(l_n.get(ch,0) for ch in word)].add(word)

Converting words to numbers in PHP

There are lots of pages discussing the conversion from numbers to words. Not so many for the reverse direction. The best I could find was some pseudo-code on Ask Yahoo. See http://answers.yahoo.com/question/index?qid=20090216103754AAONnDz for a nice algorithm:

Well, overall you are doing two things: Finding tokens (words that translates to numbers) and applying grammar. In short, you are building a parser for a very limited language.

The tokens you would need are:

POWER: thousand, million, billion

HUNDRED: hundred

TEN: twenty, thirty... ninety

UNIT: one, two, three, ... nine,

SPECIAL: ten, eleven, twelve, ... nineteen

(drop any "and"s as they are meaningless. Break hyphens into two tokens. That is sixty-five should be processed as "sixty" "five")

Once you've tokenized your string, move from RIGHT TO LEFT.

  1. Grab all the tokens from the RIGHT until you hit a POWER or the whole string.

  2. Parse the tokens after the stop point for these patterns:

    SPECIAL

    TEN

    UNIT

    TEN UNIT

    UNIT HUNDRED

    UNIT HUNDRED SPECIAL

    UNIT HUNDRED TEN

    UNIT HUNDRED UNIT

    UNIT HUNDRED TEN UNIT

    (This assumes that "seventeen hundred" is not allowed in this grammar)

    This gives you the last three digits of your number.

  3. If you stopped at the whole string you are done.

  4. If you stopped at a power, start again at step 1 until you reach a higher POWER or the whole string.



Related Topics



Leave a reply



Submit