Measure the pronounceability of a word?
Here is a function which should work with the most common of words... It should give you a nice result between 1 (perfect pronounceability according to the rules) to 0.
The following function far from perfect (it doesn't quite like words like Tsunami [0.857]). But it should be fairly easy to tweak for your needs.
<?php
// Score: 1
echo pronounceability('namelet') . "\n";
// Score: 0.71428571428571
echo pronounceability('nameoic') . "\n";
function pronounceability($word) {
static $vowels = array
(
'a',
'e',
'i',
'o',
'u',
'y'
);
static $composites = array
(
'mm',
'll',
'th',
'ing'
);
if (!is_string($word)) return false;
// Remove non letters and put in lowercase
$word = preg_replace('/[^a-z]/i', '', $word);
$word = strtolower($word);
// Special case
if ($word == 'a') return 1;
$len = strlen($word);
// Let's not parse an empty string
if ($len == 0) return 0;
$score = 0;
$pos = 0;
while ($pos < $len) {
// Check if is allowed composites
foreach ($composites as $comp) {
$complen = strlen($comp);
if (($pos + $complen) < $len) {
$check = substr($word, $pos, $complen);
if ($check == $comp) {
$score += $complen;
$pos += $complen;
continue 2;
}
}
}
// Is it a vowel? If so, check if previous wasn't a vowel too.
if (in_array($word[$pos], $vowels)) {
if (($pos - 1) >= 0 && !in_array($word[$pos - 1], $vowels)) {
$score += 1;
$pos += 1;
continue;
}
} else { // Not a vowel, check if next one is, or if is end of word
if (($pos + 1) < $len && in_array($word[$pos + 1], $vowels)) {
$score += 2;
$pos += 2;
continue;
} elseif (($pos + 1) == $len) {
$score += 1;
break;
}
}
$pos += 1;
}
return $score / $len;
}
Is there any logic to validate whether a group of letters could be considered a phonetic word?
Prevent letters to be repeated more than 3 times first, for example ccc will be invalid (or maybe you could do every letters except vowels so aaaaa, eeeee, uuuuu will be ok), then check all words from a list of existing words of your language only if you want to check something, but if you're generating a pronouncable word I don't think you'll need existing words.
Pleas also check this: pronounceability algorithm , http://10000ideas.blogspot.fr/2011/07/what-makes-word-pronounceable.html and this one : Measure the pronounceability of a word?
How can I check if a string can be pronounced?
You might have some success by first splitting the word into syllables. This question on SO might help. Of course, this will only work for languages which, like English, use an alphabet which includes letters and whose letters include vowel sounds.
How do I determine if a random string sounds like English?
You can build a markov-chain of a huge english text.
Afterwards you can feed words into the markov chain and check how high the probability is that the word is english.
See here: http://en.wikipedia.org/wiki/Markov_chain
At the bottom of the page you can see the markov text generator. What you want is exactly the reverse of it.
In a nutshell: The markov-chain stores for each character the probabilities of which next character will follow. You can extend this idea to two or three characters if you have enough memory.
Related Topics
Codeigniter 4 Problem Installing with Composer
Datetime Now PHP MySQL (+ Pdo Variant)
Replace Any Url's Within a String of Text, to Clickable Links with PHP
PHP Class Not Found But It's Included
How to Use PHPize After Update to MACos Mojave
How to Get an Array of Data from $_Post
How to JSON_Encode Array with French Accents
Programmatically Create Image from Web-Page or a Single Div
How to Keep a PHP Session Active Even If the Browser Is Closed
Get Calling File Name from Include()
Randomize a PHP Array with a Seed
How to Find "Related Items" in PHP
Type Hinting: Default Parameters
Shortcodes Inside a Shortcode - Wordpress
How to Remove All Dtddwrappers and Labels on Zend Form Elements