What Do 'I' and '-I' in Regex Mean

What does /i at the end of a regex mean?

/i stands for ignore case in the given string. Usually referred to as case-insensitive as pointed out in the comment.

What do `?i` and `?-i` in regex mean?

(?i) starts case-insensitive mode

(?-i) turns off case-insensitive mode

More information at the "Turning Modes On and Off for Only Part of The Regular Expression" section of this page:

Modern regex flavors allow you to apply modifiers to only part of the
regular expression. If you insert the modifier (?ism) in the middle of
the regex, the modifier only applies to the part of the regex to the
right of the modifier. You can turn off modes by preceding them with a
minus sign. All modes after the minus sign will be turned off. E.g.
(?i-sm) turns on case insensitivity, and turns off both single-line
mode and multi-line mode.

Not all regex flavors support this. JavaScript and Python apply all
mode modifiers to the entire regular expression. They don't support
the (?-ismx) syntax, since turning off an option is pointless when
mode modifiers apply to the whole regular expressions. All options are
off by default.

You can quickly test how the regex flavor you're using handles mode
modifiers. The regex (?i)te(?-i)st should match test and TEst, but not
teST or TEST.

Regex Explanation ^.*$

  • ^ matches position just before the first character of the string
  • $ matches position just after the last character of the string
  • . matches a single character. Does not matter what character it is, except newline
  • * matches preceding match zero or more times

So, ^.*$ means - match, from beginning to end, any character that appears zero or more times. Basically, that means - match everything from start to end of the string. This regex pattern is not very useful.

Let's take a regex pattern that may be a bit useful. Let's say I have two strings The bat of Matt Jones and Matthew's last name is Jones. The pattern ^Matt.*Jones$ will match Matthew's last name is Jones. Why? The pattern says - the string should start with Matt and end with Jones and there can be zero or more characters (any characters) in between them.

Feel free to use an online tool like https://regex101.com/ to test out regex patterns and strings.

What does (?i) and ?@ in this regex mean

demo here : https://regex101.com/r/hE9gB4/1

(?i)<.*?@(?P<domain>\w+\.\w+)(?=>)

its actually getting your domain name from the email id:

(?i) makes it match case insensitive and

?@ is nothing but @ which matches the character @ literally.

the ? in your ?@ is part of .*? which we call as a lazy operator, It will give you the text between the < and @

if you dont use the ? after the .* it will match everything after < to the end. ( we call this as the greedy operator)

What does the regex [^\s]*? mean?

Alright, so to answer your first question, I'll break down [^\s]*?.

  • The square brackets ([]) indicate a character class. A character class basically means that you want to match anything in the class, at that position, one time. [abc] will match the strings a, b, and c. In this case, your character class is negated using the caret (^) at the beginning - this inverts its meaning, making it match anything but the characters in it.

  • \s is fairly simple - it's a common shorthand in many regex flavours for "any whitespace character". This includes spaces, tabs, and newlines.

  • *? is a little harder to explain. The * quantifier is fairly simple - it means "match this token (the character class in this case) zero or more times". The ?, when applied to a quantifier, makes it lazy - it will match as little as it can, going from left to right one character at a time.

In this case, what the whole pattern snippet [^\s]*? means is "match any sequence of non-whitespace characters, including the empty string". As mentioned in the comments, this can more succinctly be written as \S*?.

To answer the second part of your question, I'll compare the two regexes you give:

http:[^\s]*?(\.jpg|\.png|\.gif)
http://.*?(\.jpg|\.png|\.gif)

They both start the same way: attempting to match the protocol at the beginning of a URL and the subsequent colon (:) character. The first then matches any string that does not contain any whitespace and ends with the specified file extensions. The second, meanwhile, will match two literal slash characters (/) before matching any sequence of characters followed by a valid extension.

Now, it's obvious that both patterns are meant to match a URL, but both are incorrect. The first pattern, for instance, will match strings like

http:foo.bar.png
http:.png

Both of which are invalid. Likewise, the second pattern will permit spaces, allowing stuff like this:

http:// .jpg
http://foo bar.png

Which is equally illegal in valid URLs. A better regex for this (though I caution strongly against trying to match URLs with regexes) might look like:

https?://\S+\.(jpe?g|png|gif)

In this case, it'll match URLs starting with both http and https, as well as files that end in both variations of jpg.

Find regex match including current line of text

If you want to include the whole match, you have to change the positions of the parantheses:

(- class: pipe.steps.validate.Validate.*?id: validate)


Related Topics



Leave a reply



Submit