Measure the Pronounceability of a Word

Measure the pronounceability of a word?

Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.

The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.

<?php
// Score: 1
echo pronounceability('namelet') . "\n";

// Score: 0.71428571428571
echo pronounceability('nameoic') . "\n";

function pronounceability($word) {
static $vowels = array
(
'a',
'e',
'i',
'o',
'u',
'y'
);

static $composites = array
(
'mm',
'll',
'th',
'ing'
);

if (!is_string($word)) return false;

// Remove non letters and put in lowercase
$word = preg_replace('/[^a-z]/i', '', $word);
$word = strtolower($word);

// Special case
if ($word == 'a') return 1;

$len = strlen($word);

// Let's not parse an empty string
if ($len == 0) return 0;

$score = 0;
$pos = 0;

while ($pos < $len) {
// Check if is allowed composites
foreach ($composites as $comp) {
$complen = strlen($comp);

if (($pos + $complen) < $len) {
$check = substr($word, $pos, $complen);

if ($check == $comp) {
$score += $complen;
$pos += $complen;
continue 2;
}
}
}

// Is it a vowel? If so, check if previous wasn't a vowel too.
if (in_array($word[$pos], $vowels)) {
if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
$score += 1;
$pos += 1;
continue;
}
} else { // Not a vowel, check if next one is, or if is end of word
if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
$score += 2;
$pos += 2;
continue;
} elseif (($pos + 1) == $len) {
$score += 1;
break;
}
}

$pos += 1;
}

return $score / $len;
}

Is there any logic to validate whether a group of letters could be considered a phonetic word?

Prevent letters to be repeated more than 3 times first, for example ccc will be invalid (or maybe you could do every letters except vowels so aaaaa, eeeee, uuuuu will be ok), then check all words from a list of existing words of your language only if you want to check something, but if you're generating a pronouncable word I don't think you'll need existing words.

Pleas also check this: pronounceability algorithm , http://10000ideas.blogspot.fr/2011/07/what-makes-word-pronounceable.html and this one : Measure the pronounceability of a word?

How can I check if a string can be pronounced?

You might have some success by first splitting the word into syllables. This question on SO might help. Of course, this will only work for languages which, like English, use an alphabet which includes letters and whose letters include vowel sounds.

How do I determine if a random string sounds like English?

You can build a markov-chain of a huge english text.

Afterwards you can feed words into the markov chain and check how high the probability is that the word is english.

See here: http://en.wikipedia.org/wiki/Markov_chain

At the bottom of the page you can see the markov text generator. What you want is exactly the reverse of it.

In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.



Related Topics



Leave a reply



Submit