How to Correctly Prefix a Word with "A" and "An"

How can I correctly prefix a word with a and an?

  1. Download Wikipedia
  2. Unzip it and write a quick filter program that spits out only article text (the download is generally in XML format, along with non-article metadata too).
  3. Find all instances of a(n).... and make an index on the following word and all of its prefixes (you can use a simple suffixtrie for this). This should be case sensitive, and you'll need a maximum word-length - 15 letters?
  4. (optional) Discard all those prefixes which occur less than 5 times or where "a" vs. "an" achieves less than 2/3 majority (or some other threshholds - tweak here). Preferably keep the empty prefix to avoid corner-cases.
  5. You can optimize your prefix database by discarding all those prefixes whose parent shares the same "a" or "an" annotation.
  6. When determining whether to use "A" or "AN" find the longest matching prefix, and follow its lead. If you didn't discard the empty prefix in step 4, then there will always be a matching prefix (namely the empty prefix), otherwise you may need a special case for a completely-non matching string (such input should be very rare).

You probably can't get much better than this - and it'll certainly beat most rule-based systems.

Edit: I've implemented this in JS/C#. You can try it in your browser, or download the small, reusable javascript implementation it uses. The .NET implementation is package AvsAn on nuget. The implementations are trivial, so it should be easy to port to any other language if necessary.

Turns out the "rules" are quite a bit more complex than I thought:

  • it's an unanticipated result but it's a unanimous vote
  • it's an honest decision but a honeysuckle shrub
  • Symbols: It's an 0800 number, or an ∞ of oregano.
  • Acronyms: It's a NASA scientist, but an NSA analyst; a FIAT car but an FAA policy.

...which just goes to underline that a rule based system would be tricky to build!

C# Start with a vs an

Let's assume that any word that begins with a vowel will be preceded by "an" and that all other words will be preceded by "a".

string getArticle(string forWord)
{
var vowels = new List<char> { 'A', 'E', 'I', 'O', 'U' };

var firstLetter = forWord[0];
var firstLetterCapitalized = char.ToUpper(firstLetter);
var beginsWithVowel = vowels.Contains(firstLetterCapitalized);

if (beginsWithVowel)
return "an";

return "a";
}

Could this be simplified and improved? Of course. However, it should serve as somewhere from which to start.

Less readable but shorter versions exists, such as:

string getArticle(string forWord) => (new List<char> { 'A', 'E', 'I', 'O', 'U' }).Contains(char.ToUpper(forWord[0])) ? "an" : "a";

However, both of these ignore edge cases such as forWord being null or empty.

Replace incorrect use of a and an in text input

Following the flippant answer to How can I correctly prefix a word with "a" and "an"?, Eamon Nerbonne followed the given advice and produced an efficient algorithm that accurately identifies the correct indefinite article to use before any following text. So thanks @JayMEE for the pointer, it did actually help.

Implementation of the algorithm is outside the scope of basic Q & A - you can read about it in Eamon's blog entry and GitHub repository. However, it's dead simple to use!

Here's how fixArticles() can be modified to use the simple, minified version of Eamon's code, AvsAn-simple.min.js. See the JSFiddle Demo.

function fixArticles(txt) {
var valTxt = txt.replace(/\b(a|an) ([\s\(\"'“‘-]?\w*)\b/gim, function(match, article, following) {
var input = following.replace(/^[\s\(\"'“‘-]+|\s+$/g, ""); //strip initial punctuation symbols
var res = AvsAnSimple.query(input);
var newArticle = res.replace(/^a/i, article.charAt(0));
if (newArticle !== article) {
newArticle = "<span class='changed'>" + newArticle + "</span>";
}
return newArticle + ' ' + following;
});

document.getElementById('output-text').innerHTML = valTxt.replace(/\n/gm, '<br/>');
}

Add prefix (a or an) to a word using php

You can simply impliment following code

$word = "elephant"; // put word for which you want prefix
$vowelArry = array('a','e','i','o','u'); // array of vowel
$prefix = in_array(strtolower(substr($word ,0,1)),$vowelArry)? "an" : "a"; // logic to add prefix
$updated_word = $prefix." ".$word // updated word

I hope this will help you.

Regex to change a to an depending on the next word

There was a very good thread about this on StackOverflow a while ago: How can I correctly prefix a word with "a" and "an"?

Basically the consensus was that the best way involves a large dataset from which to learn, and the second-best way involves a pronunciation dictionary such as the CMU dict designed for speech synthesis.

To give an example from the CMU dict:

University comes out as:
Y UW N AH V ER S AH T IY .

Umbrella is rendered as:
AH M B R EH L AH .

Programmatically determine whether to describe an object with a or an?

What you want is to determine the appropriate indefinite article. Lingua::EN::Inflect is a Perl module that does an great job. I've extracted the relevant code and pasted it below. It's just a bunch of cases and some regular expressions, so it shouldn't be difficult to port to PHP. A friend ported it to Python here if anyone is interested.

# 2. INDEFINITE ARTICLES

# THIS PATTERN MATCHES STRINGS OF CAPITALS STARTING WITH A "VOWEL-SOUND"
# CONSONANT FOLLOWED BY ANOTHER CONSONANT, AND WHICH ARE NOT LIKELY
# TO BE REAL WORDS (OH, ALL RIGHT THEN, IT'S JUST MAGIC!)

my $A_abbrev = q{
(?! FJO | [HLMNS]Y. | RY[EO] | SQU
| ( F[LR]? | [HL] | MN? | N | RH? | S[CHKLMNPTVW]? | X(YL)?) [AEIOU])
[FHLMNRSX][A-Z]
};

# THIS PATTERN CODES THE BEGINNINGS OF ALL ENGLISH WORDS BEGINING WITH A
# 'y' FOLLOWED BY A CONSONANT. ANY OTHER Y-CONSONANT PREFIX THEREFORE
# IMPLIES AN ABBREVIATION.

my $A_y_cons = 'y(b[lor]|cl[ea]|fere|gg|p[ios]|rou|tt)';

# EXCEPTIONS TO EXCEPTIONS

my $A_explicit_an = enclose join '|',
(
"euler",
"hour(?!i)", "heir", "honest", "hono",
);

my $A_ordinal_an = enclose join '|',
(
"[aefhilmnorsx]-?th",
);

my $A_ordinal_a = enclose join '|',
(
"[bcdgjkpqtuvwyz]-?th",
);

sub A {
my ($str, $count) = @_;
my ($pre, $word, $post) = ( $str =~ m/\A(\s*)(?:an?\s+)?(.+?)(\s*)\Z/i );
return $str unless $word;
my $result = _indef_article($word,$count);
return $pre.$result.$post;
}

sub AN { goto &A }

sub _indef_article {
my ( $word, $count ) = @_;

$count = $persistent_count
if !defined($count) && defined($persistent_count);

return "$count $word"
if defined $count && $count!~/^($PL_count_one)$/io;

# HANDLE USER-DEFINED VARIANTS

my $value;
return "$value $word"
if defined($value = ud_match($word, @A_a_user_defined));

# HANDLE ORDINAL FORMS

$word =~ /^($A_ordinal_a)/i and return "a $word";
$word =~ /^($A_ordinal_an)/i and return "an $word";

# HANDLE SPECIAL CASES

$word =~ /^($A_explicit_an)/i and return "an $word";
$word =~ /^[aefhilmnorsx]$/i and return "an $word";
$word =~ /^[bcdgjkpqtuvwyz]$/i and return "a $word";

# HANDLE ABBREVIATIONS

$word =~ /^($A_abbrev)/ox and return "an $word";
$word =~ /^[aefhilmnorsx][.-]/i and return "an $word";
$word =~ /^[a-z][.-]/i and return "a $word";

# HANDLE CONSONANTS

$word =~ /^[^aeiouy]/i and return "a $word";

# HANDLE SPECIAL VOWEL-FORMS

$word =~ /^e[uw]/i and return "a $word";
$word =~ /^onc?e\b/i and return "a $word";
$word =~ /^uni([^nmd]|mo)/i and return "a $word";
$word =~ /^ut[th]/i and return "an $word";
$word =~ /^u[bcfhjkqrst][aeiou]/i and return "a $word";

# HANDLE SPECIAL CAPITALS

$word =~ /^U[NK][AIEO]?/ and return "a $word";

# HANDLE VOWELS

$word =~ /^[aeiou]/i and return "an $word";

# HANDLE y... (BEFORE CERTAIN CONSONANTS IMPLIES (UNNATURALIZED) "i.." SOUND)

$word =~ /^($A_y_cons)/io and return "an $word";

# OTHERWISE, GUESS "a"
return "a $word";
}

Regular expression for word with specific prefix/suffix

If i understand correctly, this should do:

(?:^|\s)\S?ring\S?(?:\s|$)
  • (?:^|\s) - this non-capturing group makes sure that the pattern is preceded by a whitespace or at the beginning

  • \S? matches zero or one non-whitespace character

  • ring matches literal ring

  • (?:\s|$) - the zero width positive lookahead makes sure the match is preceded by a space or is at the end

Example:

In [92]: l = ['ring ', ' ringt', ' ringt ', ' ring ', \
'tringt ', 'tringt ', 'ttring', 'ringttt', 'ttringtt']

In [93]: list(filter(lambda s: re.search(r'(?:^|\s)\S?ring\S?(?:\s|$)', s), l))
Out[93]: ['ring ', ' ringt', ' ringt ', ' ring ', 'tringt ', 'tringt ']

Add prefix and suffix to all occurrences of a word in a string in JavaScript

Regex

Match every occurence of "this"

Simple replace with a capturing group:

const str = "Hello This is a test string this is a test THIS is just a test ok? Can we solve This? Idk, maybe thiS, is just impossible.";

const result = str.replace(/(this)/gi, "Foo$1Bar")
console.log(result)


Related Topics



Leave a reply



Submit