Regular Expression to Match A, Ab, Abc, But Not Ac. ("Starts With")

Regular expression to match A, AB, ABC, but not AC. (starts with)

Try this regular expression:

^(A(B(C)?)?)?$

I think you can see the pattern and expand it for ABCD and ABCDE like:

^(A(B(C(D)?)?)?)?$
^(A(B(C(D(E)?)?)?)?)?$

Now each part depends on the preceeding parts (B depends on A, C depends on B, etc.).

Regular expression not allowing a and c to be next to each other

Here is a much simpler straightforward regex. Rather than thinking to exclude the pattern, you can also match the pattern and ignore them like following example:

String[] str = {
"a", "b", "c", "ba", "ca", "ab", "cb", "ac", "bc", "baa",
"caa", "aba", "cba", "aca", "bca", "bab", "cab", "abb",
"cbb", "acb", "bcb", "bac", "cac", "abc", "cbc", "acc", "bcc"
};

for(int i=0; i<str.length; ++i) {
if(str[i].matches("ac.?|.?ac|ca.?|.?ca")) {
System.out.println("MATCH: " + str[i]);
} else {
System.out.println(str[i]);
}
}

This makes the following output:

a
b
c
ba
MATCH: ca
ab
cb
MATCH: ac
bc
baa
MATCH: caa
aba
cba
MATCH: aca
MATCH: bca
bab
MATCH: cab
abb
cbb
MATCH: acb
bcb
MATCH: bac
MATCH: cac
abc
cbc
MATCH: acc
bcc

Regular expression for only characters a-z, A-Z

/^[a-zA-Z]*$/

Change the * to + if you don't want to allow empty matches.

References:

Character classes ([...]), Anchors (^ and $), Repetition (+, *)

The / are just delimiters, it denotes the start and the end of the regex. One use of this is now you can use modifiers on it.

Javascript Regular Expressions /ab*c/

"*" means "Matches the preceding expression 0 or more times". So it will match any string that contains "ac" (b 0 times in this case)

Regular expression for a-b, a-c but not a-a?

Note that \w already matches \d and _ and \w[\w\d_]+ = \w{2,}.

You can capture the first "word" (before ::) and check with a negative lookahead that the "word" after :: is not equal to it:

\b(\w+)::(?!\b\1\b)\w+\b

See the regex demo

Explanation:

  • \b - leading word boundary
  • (\w+) - Group 1: one or more alphanumeric and underscore characters
  • :: - 2 consecutive colons
  • (?!\b\1\b) - the next "word" cannot be the same as the value in Group 1
  • \w+\b - one or more alphanumeric and underscore characters followed with a trailing word boundary.

If you are not looking to match 1-character "words", you can use

\b(\w{2,})::(?!\b\1\b)\w{2,}\b

Regex that matches xa?b?c? but not x alone

Here’s the shortest version:

(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))

If you need to keep around the match in a separate group, write this:

((a)?(b)?(c)?)(?(2)|(?(3)|(?(4)|(*FAIL))))

But that isn’t very robust in case a, b, or c contain capture groups. So instead write this:

(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL))))

And if you need a group for the whole match, then write this:

(?<M>(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL)))))

And if like me you prefer multi-lettered identifiers and also think this sort of thing is insane without being in /x mode, write this:

(?x)
(?<Whole_Match>
(?<Group_A> a) ?
(?<Group_B> b) ?
(?<Group_C> c) ?

(?(<Group_A>) # Succeed
| (?(<Group_B>) # Succeed
| (?(<Group_C>) # Succeed
| (*FAIL)
)
)
)
)

And here is the full testing program to prove that those all work:

#!/usr/bin/perl
use 5.010_000;

my @pats = (
qr/(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))/,
qr/((a)?(b)?(c)?)(?(2)|(?(3)|(?(4)|(*FAIL))))/,
qr/(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL))))/,
qr/(?<M>(?<A>a)?(?<B>b)?(?<C>c)?(?(<A>)|(?(<B>)|(?(<C>)|(*FAIL)))))/,
qr{
(?<Whole_Match>

(?<Group_A> a) ?
(?<Group_B> b) ?
(?<Group_C> c) ?

(?(<Group_A>) # Succeed
| (?(<Group_B>) # Succeed
| (?(<Group_C>) # Succeed
| (*FAIL)
)
)
)

)
}x,
);

for my $pat (@pats) {
say "\nTESTING $pat";
$_ = "i can match bad crabcatchers from 34 bc and call a cab";
while (/$pat/g) {
say "$`<$&>$'";
}
}

All five versions produce this output:

i <c>an match bad crabcatchers from 34 bc and call a cab
i c<a>n match bad crabcatchers from 34 bc and call a cab
i can m<a>tch bad crabcatchers from 34 bc and call a cab
i can mat<c>h bad crabcatchers from 34 bc and call a cab
i can match <b>ad crabcatchers from 34 bc and call a cab
i can match b<a>d crabcatchers from 34 bc and call a cab
i can match bad <c>rabcatchers from 34 bc and call a cab
i can match bad cr<abc>atchers from 34 bc and call a cab
i can match bad crabc<a>tchers from 34 bc and call a cab
i can match bad crabcat<c>hers from 34 bc and call a cab
i can match bad crabcatchers from 34 <bc> and call a cab
i can match bad crabcatchers from 34 bc <a>nd call a cab
i can match bad crabcatchers from 34 bc and <c>all a cab
i can match bad crabcatchers from 34 bc and c<a>ll a cab
i can match bad crabcatchers from 34 bc and call <a> cab
i can match bad crabcatchers from 34 bc and call a <c>ab
i can match bad crabcatchers from 34 bc and call a c<ab>

Sweet, eh?

EDIT: For the x in the beginning part, just put whatever x you want at the start of the match, before the very first optional capture group for the a part, so like this:

x(a)?(b)?(c)?(?(1)|(?(2)|(?(3)|(*FAIL))))

or like this

(?x)                        # enable non-insane mode

(?<Whole_Match>
x # first match some leader string

# now match a, b, and c, in that order, and each optional
(?<Group_A> a ) ?
(?<Group_B> b ) ?
(?<Group_C> c ) ?

# now make sure we got at least one of a, b, or c
(?(<Group_A>) # SUCCEED!
| (?(<Group_B>) # SUCCEED!
| (?(<Group_C>) # SUCCEED!
| (*FAIL)
)
)
)
)

The test sentence was constructed without the x part, so it won’t work for that, but I think I’ve shown how I mean to go at this. Note that all of x, a, b, and c can be arbitrarily complex patterns (yes, even recursive), not merely single letters, and it doesn’t matter if they use numbered capture groups of their own, even.

If you want to go at this with lookaheads, you can do this:

(?x)

(?(DEFINE)
(?<Group_A> a)
(?<Group_B> b)
(?<Group_C> c)
)

x

(?= (?&Group_A)
| (?&Group_B)
| (?&Group_C)
)

(?&Group_A) ?
(?&Group_B) ?
(?&Group_C) ?

And here is what to add to the @pats array in the test program to show that this approach also works:

qr{
(?(DEFINE)
(?<Group_A> a)
(?<Group_B> b)
(?<Group_C> c)
)

(?= (?&Group_A)
| (?&Group_B)
| (?&Group_C)
)

(?&Group_A) ?
(?&Group_B) ?
(?&Group_C) ?
}x

You’ll notice please that I still manage never to repeat any of a, b, or c, even with the lookahead technique.

Do I win? ☺

Partial matching a string against a regex

Looks like you're lucky, I've already implemented that stuff in JS (which works for most patterns - maybe that'll be enough for you). See my answer here. You'll also find a working demo there.

There's no need to duplicate the full code here, I'll just state the overall process:

  • Parse the input regex, and perform some replacements. There's no need for error handling as you can't have an invalid pattern in a RegExp object in JS.
  • Replace abc with (?:a|$)(?:b|$)(?:c|$)
  • Do the same for any "atoms". For instance, a character group [a-c] would become (?:[a-c]|$)
  • Keep anchors as-is
  • Keep negative lookaheads as-is

Had JavaScript have more advanced regex features, this transformation may not have been possible. But with its limited feature set, it can handle most input regexes. It will yield incorrect results on regex with backreferences though if your input string ends in the middle of a backreference match (like matching ^(\w+)\s+\1$ against hello hel).



Related Topics



Leave a reply



Submit