What Do ^ and $ Mean in a Regular Expression

What do ^ and $ mean in a regular expression?

^ means "Match the start of the string" (more exactly, the position before the first character in the string, so it does not match an actual character).

$ means "Match the end of the string" (the position after the last character in the string).

Both are called anchors and ensure that the entire string is matched instead of just a substring.

So in your example, the first regex will report a match on email@address.com.uk, but the matched text will be email@address.com, probably not what you expected. The second regex will simply fail.

Be careful, as some regex implementations implicitly anchor the regex at the start/end of the string (for example Java's .matches(), if you're using that).

If the multiline option is set (using the (?m) flag, for example, or by doing Pattern.compile("^\\w+@\\w+[.]\\w+$", Pattern.MULTILINE)), then ^ and $ also match at the start and end of a line.

Regex Explanation ^.*$

  • ^ matches position just before the first character of the string
  • $ matches position just after the last character of the string
  • . matches a single character. Does not matter what character it is, except newline
  • * matches preceding match zero or more times

So, ^.*$ means - match, from beginning to end, any character that appears zero or more times. Basically, that means - match everything from start to end of the string. This regex pattern is not very useful.

Let's take a regex pattern that may be a bit useful. Let's say I have two strings The bat of Matt Jones and Matthew's last name is Jones. The pattern ^Matt.*Jones$ will match Matthew's last name is Jones. Why? The pattern says - the string should start with Matt and end with Jones and there can be zero or more characters (any characters) in between them.

Feel free to use an online tool like https://regex101.com/ to test out regex patterns and strings.

What does ?: in a regular expression mean?

It means that it is not capturing group. After successful match first (\d*) will be captured in $1, and second in $2, and (?: \D.*?) would not be captured at all.

$string =~ m/^(\d*)(?: \D.*?)(\d*)$/

From perldoc perlretut

Non-capturing groupings

A group that is required to bundle a set of alternatives may or may not be useful as a capturing group. If it isn't, it just creates a superfluous addition to the set of available capture group values, inside as well as outside the regexp. Non-capturing groupings, denoted by (?:regexp), still allow the regexp to be treated as a single unit, but don't establish a capturing group at the same time.

What's the meaning of ^ and $ symbols in regular expression?

^ is beginning of input and $ is the end of it.

e.g.

  • ^[0-9] - everything that starts from a digit
  • [0-9]$ - everything that ends with a digit

And a little bit more detailed description from wiki:

  • ^ Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
  • $ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.

What is the meaning of + in a regex?

+ can actually have two meanings, depending on context.

Like the other answers mentioned, + usually is a repetition operator, and causes the preceding token to repeat one or more times. a+ would be expressed as aa* in formal language theory, and could also be expressed as a{1,} (match a minimum of 1 times and a maximum of infinite times).


However, + can also make other quantifiers possessive if it follows a repetition operator (ie ?+, *+, ++ or {m,n}+). A possessive quantifier is an advanced feature of some regex flavours (PCRE, Java and the JGsoft engine) which tells the engine not to backtrack once a match has been made.

To understand how this works, we need to understand two concepts of regex engines: greediness and backtracking. Greediness means that in general regexes will try to consume as many characters as they can. Let's say our pattern is .* (the dot is a special construct in regexes which means any character1; the star means match zero or more times), and your target is What Do ^ and $ Mean in a Regular Expressionb. The entire string will be consumed, because the entire string is the longest match that satisfies the pattern.

However, let's say we change the pattern to .*b. Now, when the regex engine tries to match against What Do ^ and $ Mean in a Regular Expressionb, the .* will again consume the entire string. However, since the engine will have reached the end of the string and the pattern is not yet satisfied (the .* consumed everything but the pattern still has to match b afterwards), it will backtrack, one character at a time, and try to match b. The first backtrack will make the .* consume What Do ^ and $ Mean in a Regular Expression, and then b can consume b, and the pattern succeeds.

Possessive quantifiers are also greedy, but as mentioned, once they return a match, the engine can no longer backtrack past that point. So if we change our pattern to .*+b (match any character zero or more times, possessively, followed by a b), and try to match What Do ^ and $ Mean in a Regular Expressionb, again the .* will consume the whole string, but then since it is possessive, backtracking information is discarded, and the b cannot be matched so the pattern fails.


1 In most engines, the dot will not match a newline character, unless the /s ("singleline" or "dotall") modifier is specified.

What does +? mean in regex?

The ? makes the + "lazy" instead of "greedy". This means it tries to match as few times as possible, instead of trying to match as many times as possible.

What does ?= mean in a regular expression?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured.

Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What do ++ and *+ mean?

++

From What is double plus in regular expressions?

That's a Possessive Quantifier.

It basically means that if the regex engine fails matching later, it will not go back and try to undo the matches it made here. In most cases, it allows the engine to fail much faster, and can give you some control where you need it - which is very rare for most uses.

*+

*+ is the possessive quantifier for the * quantifier.

What does [^.]* mean in regular expression?

Within the [] the . means just a dot. And the leading ^ means "anything but ...".

So [^.]* matches zero or more non-dots.



Related Topics



Leave a reply



Submit