How to Detect Strings Like Putjbtghguhjjjanika

How whether a string is randomly generated or plausibly an English word?

If you mean some kind of a rule of a thumb that distinguishes english word from random text, there is none. For reasonable accuracy you will need to query an external source, whether it's the Web, dictionary, or a service.

If you only need to check for an existence of the word, I would suggest Wordnet. It is pretty simple to use and there is a nice Java API for it called JWNL, that makes querying Wordnet dictionary a breeze.

php function to detect if the string text is meaningful text

Here is something to check out:
https://github.com/DaveChild/Text-Statistics

This project describes itself as:

The PHP Text Statistics class will help you to indentify issues with your website content, especially with readability. It allows you to measure the readability of text using common scoring systems, including:

  • Flesch Kincaid Reading Ease
  • Flesch Kincaid Grade Level
  • Gunning Fog Score
  • Coleman Liau Index
  • SMOG Index
  • Automated Reability Index

The code that generates the statistics is in a simple class structure. There are also several unit test classes to ensure that changes made don't break existing functionality. There is also a live version of this tool.

Is using 'like' a possibility in TestComplete?

Use regular expressions.





var label = 'ABC'

if (label.match(/abc[12]?/i)) console.log('yes')

else console.log('no')

Is there a way for PHP (or jQuery) to check if a string is human readable?

This can be done using something called Markov Chains.

Essentially, they read through a large chunk of text in a given language (English, French, Russian, etc.) and determine the probability of one character being after another.

e.g. a "q" has a much lower probability of occurring after a "z" than a vowel such as "a" does.

At a lower level, this is actually implemented as a state machine.

As per Mike's comment, a PHP version of this can be found here.

For flavor, an amusing the Daily WTF article on Markov Chains.

PHP judge a string as a human name or other text

This Bayesian approach that I use for filtering with quite a bit of success on a contact submission and a request for quote forms. The form is using scoring and handles requests from all over the world in various languages. If they fail 3 or 4 tests on various fields only then do I mark them as a Spam attempt. Obviously things like '123456' throw up a red flag instantly for a phone number. Also BBCode in the comments is a dead giveaway.

<?php
function nameCheck($var) {
$nameScore = 0;
//If name < 4 score + '3'
$chars_count = strlen($var);
$consonants = preg_replace('![^BCDFGHJKLMNPQRSTVWXZ]!i','',$var);
$consonant_count = strlen($consonants);
$vowels = preg_replace('![^AEIOUY]!i','',$var);
$vowel_count = strlen($vowels);
//We're expecting first and last name.
if ($chars_count < 4){
$nameScore = $nameScore + 3;
}

//if name > 4 and no spaces score + '4'
if (($chars_count > 4)&& (!preg_match('![ ]!',$var))){
$nameScore = $nameScore + 4;
}

if (($chars_count > 4)&&(($consonant_count==0)||($vowel_count==0))){
$nameScore = $nameScore + 5;
}

//if name > 4 and vowel to consonant ratio < 1/8 score + '5'
if (($consonant_count > 0) && ($vowel_count > 0) && ($chars_count > 4) && ($vowel_count/$consonant_count < 1/8)){
$nameScore = $nameScore + 5;
}
//Needs at least 1 letter.
if (!preg_match('![A-Za-z]!',$var)){
$nameScore = $nameScore + 10;
}

return $nameScore;
}

//added for testing
$var = $_GET['email'];
echo nameCheck($var);
?>

Even if someone flushes I have it copy me on the attempt so I can fix my scoring. There are a few false-positives usually in Chinese or Korean, but for the most part anyone who completes the form in English will pass. Names like "Wu Xi" do exist.



Related Topics



Leave a reply



Submit