Regex Plus VS Star Difference

Regex plus vs star difference?

They are called quantifiers.

* 0 or more of the preceding expression

+ 1 or more of the preceding expression

Per default a quantifier is greedy, that means it matches as many characters as possible.

The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.

Example greedy/ungreedy

For example on the string "abab"

a.*b will match "abab" (preg_match_all will return one match, the "abab")

while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")

You can test your regexes online e.g. on Regexr, see the greedy example here

Regular expression, plus vs asterisk

Your problem is with g mode which is probably not set. If you set this global mode you will see expected substring is matched.

This (0*)(\d*) matches but returns more than two groups in a g mode because both patterns are *-quantified which includes zero-length matches.

+ quantifier denotes at least one occurrence of preceding token so it looks for something which its existence is a must. Having that said, it doesn't return zero-length matches.

Your third try (0*)(\d*$) works the same as + quantifier for the reason that zero-length matches couldn't occur earlier than meeting digits that meet the end of input string. With this regex however, there is a zero-length match at the end when g mode is on.

GREP - Regex +(plus) vs. *(star) performance

These two expressions:

rtmp.*?\b/
rtmp.+?\b/

match different things. * means "zero or more of the previous expression" (i.e. any number), + means "one or more of the previous expression". So .*? matches any number of any characters in non-greedy mode and .+? matches any positive number of any characters in non-greedy mode.

The speed difference is immaterial, use the expression that matches your intent.

Difference between regex quantifiers plus and star

The * quantifier matches zero or more occurences.

In practice, this means that

\d*

will match every possible input, including the empty string. So your regex matches at the start of the input string and returns the empty string.

What is the meaning of + in a regex?

+ can actually have two meanings, depending on context.

Like the other answers mentioned, + usually is a repetition operator, and causes the preceding token to repeat one or more times. a+ would be expressed as aa* in formal language theory, and could also be expressed as a{1,} (match a minimum of 1 times and a maximum of infinite times).


However, + can also make other quantifiers possessive if it follows a repetition operator (ie ?+, *+, ++ or {m,n}+). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.

To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .* (the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is Regex Plus VS Star Differenceb. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.

However, let's say we change the pattern to .*b. Now, when the regex engine tries to match against Regex Plus VS Star Differenceb, the .* will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .* consumed everything but the pattern still has to match b afterwards), it will backtrack, one character at a time, and try to match b. The first backtrack will make the .* consume Regex Plus VS Star Difference, and then b can consume b, and the pattern succeeds.

Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b (match any character zero or more times, possessively, followed by a b), and try to match Regex Plus VS Star Differenceb, again the .* will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.


1 In most engines, the dot will not match a newline character, unless the /s ("singleline" or "dotall") modifier is specified.

regex: plus sign vs asterisk

As far as I can tell, it doesn't. With GNU grep versions 2.5.3, 2.6.3, 2.10, and 2.12, I get:

$ echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]+"
ABC
DDD
$ echo "ABC ddd kkk DDD" | grep -Eo "[A-Z]*"
ABC
DDD

Please double-check your second example. If you can confirm that you get only one line of output, it might be a bug in your grep. If you're using GNU grep, what's the output of grep --version? If not, what OS are you using, and (if you know) what grep implementation?

UPDATE :

I just built and installed GNU grep 2.5.1 (the version you're using) from source, and I confirm your output. It appears to be a bug in that version of grep, apparently corrected between 2.5.1a and 2.5.3. GNU grep 2.5.1 is about 12 years old; can you install a newer version? Looking through the ChangeLog for 2.5.3, I suspect this may have been the fix:

2005-08-24  Charles Levert  <charles_levert@gna.org>

* src/grep.c (print_line_middle): In case of an empty match,
make minimal progress and continue instead of aborting process
of the remainder of the line, in case there's still an upcoming
non-empty match.
* tests/foad1.sh: Add two tests for this.
* doc/grep.texi, doc/grep.1: Document this behavior, since
--only-matching and --color are GNU extensions which are
otherwise unspecified by POSIX or other standards.

Even if you don't have full access on the machine you're using, you should still be able to download the source tarball from ftp://ftp.gnu.org/gnu/grep/ and install it under your home directory (assuming your system has a working compiler and associated tools).

Difference between * and + in regex javascript

You are asking the same question again. So let me explain.

var str= "oofo fooloo"
var StarSymbol= str.match(/fo*/g);
var PlusSymbol= str.match(/fo+/g)
console.log(StarSymbol) // ["fo", "foo"]
console.log(PlusSymbol) // ["fo", "foo"]

Ya, both gives the same result here(for this input) but fo* would match f alone where fo+ would not. * repeats the previous token zero or more times where + repeat the previous token one or more times. So this expects the previous token to be repeated atleast one time.

Example:

> var str= "f"
undefined
> str.match(/fo*/g);
[ 'f' ]
> str.match(/fo+/g);
null
>

How .* (dot star) works?

Each case is different:

.*([a-m\/]*).*

The first .* will probably match the whole string, because [a-m/] is not required to be present, and the first * is greedy and comes first.

.*([a-m\/]+).*

The first .* will match the whole string up to the last character that matches [a-m/] since only one is required, and the first * is greedy and comes first.

.*?([a-m\/]*).*

The first .*? will match the string up to the FIRST character that matches [a-m/], because *? is not greedy, then [a-m/]* will match all it can, because * is greedy, and then the last .* will match the rest of the string.



Related Topics



Leave a reply



Submit