What Is '-Mix' in a Ruby Regular Expression

What is '?-mix' in a Ruby Regular Expression

mix is not the English word mix, it's options of Regexp.

See Regexp#to_s:

Returns a string containing the regular expression and its options (using the (?opts:source) notation.

In your example, m is for multiline mode, i is for case insensitive, and x is for extended mode. Options before the dash are on, those after are off (default). The question's example, ?-mix, has all options off.

You can turn them on like:

puts /^a$/mix
# =>(?mix:^a$)

What does ?-mix: mean in regex

Assuming perl context, (?-mix) this would

  • -m disable multiline matching
  • -i disable case insensitive matching
  • -x disable extended regex whitespace

See here: http://www.regular-expressions.info/modifiers.html

Not all regex flavors support this. JavaScript and Python apply all mode modifiers to the entire regular expression. They don't support the (?-ismx) syntax, since turning off an option is pointless when mode modifiers apply to the whole regular expressions. All options are off by default.

How to combine these two Ruby string tests in one regular expression?

The most important improvement you can make is to also test that the word and the parentheseses have the correct relationship. If I understand correctly, "link(url link_name)" should be a match but "(url link_name)link" or "link stuff (url link_name)" should not. So match "link", the parentheses, and their contents, and capture the contents, all at once:

"stuff link(url link_name) more stuff".match(/link\((\S+?) (\S+?)\)/)&.captures
=> ["url", "link_name"]

(&. is Ruby 2.3; use Rails' .try :captures in older versions.)

Side note: string.scan(regex).present? is more concisely written as string =~ regex.

How do I use Regexp.union within another regular expression?

Solution

Regexp.new("[[:space:]]+(#{Regexp.union(LETTERS).source})", Regexp::IGNORECASE)

You could use this regex:

LETTERS = ["a","b"]
#=> ["a","b"]
regex = Regexp.new("[[:space:]]+#{Regexp.union(LETTERS)}", Regexp::IGNORECASE)
#=> /[[:space:]]+(?-mix:a|b)/i
data = ["asdf f", "sdfsdf x"]
#=> ["asdf f", "sdfsdf x"]
data.grep(regex)
#=> []
data = ["asdf f", "sdfsdf a"]
#=> ["asdf f", "sdfsdf a"]
data.grep(regex)
#=> ["sdfsdf a"]

But the innermost regular expression will not ignore case. Thanks to the @EricDuminil's solution its easy to see the mistake.

Regex: Match all hyphens or underscores not at the beginning or the end of the string

The fact that

^(?![_-])(\w+)[_-](\w+)(?<![_-])$

does not match the second hyphen in "eslint-global-path" is because of the anchor ^ which limits the match to be on the first hyphen only. This regex reads, "Match the beginning of the line, not followed by a hyphen or underscore, then match one or more words characters (including underscores), a hyphen or underscore, and then one or more word characters in a capture group. Lastly, do not match a hyphen or underscore at the end of the line."

The fact that an underscore (but not a hyphen) is a word (\w) character completely messes up the regex. In general, rather than using \w, you might want to use \p{Alpha} or \p{Alnum} (or POSIX [[:alpha:]] or [[:alnum:]]).

Try this.

r = /
(?<= # begin a positive lookbehind
[^_-] # match a character other than an underscore or hyphen
) # end positive lookbehind
( # begin capture group 1
(?: # begin a non-capture group
-+ # match one or more hyphens
| # or
_+ # match one or more underscores
) # end non-capture group
[^_-] # match any character other than an underscore or hyphen
) # end capture group 1
/x # free-spacing regex definition mode

'_cats_have--nine_lives--'.gsub(r) { |s| s[-1].upcase }
#=> "_catsHaveNineLives--"

This regex is conventionally written as follows.

r = /(?<=[^_-])((?:-+|_+)[^_-])/

If all the letters are lower case one could alternatively write

'_cats_have--nine_lives--'.split(/(?<=[^_-])(?:_+|-+)(?=[^_-])/).
map(&:capitalize).join
#=> "_catsHaveNineLives--"

where

'_cats_have--nine_lives--'.split(/(?<=[^_-])(?:_+|-+)(?=[^_-])/)
#=> ["_cats", "have", "nine", "lives--"]

(?=[^_-]) is a positive lookahead that requires the characters on which the split is made to be followed by a character other than an underscore or hyphen

Inspection of 'Regexp.union'

What you're seeing is a representation of options on sub-regexes. The options to the left of the hyphen are on, and the options to the right of the hyphen are off. It's smart to explicitly set each option as on or off to ensure the right behavior if this regex ever became part of a larger one.

In your example, (?-mix:dogs) means that the m, i, and x options are all off whereas in (?i-mx:cats), the i option is on and thus that subexpression is case-insensitive.

See the Ruby docs on Regexp Options:

The end delimiter for a regexp can be followed by one or more single-letter options which control how the pattern can match.

  • /pat/i - Ignore case
  • /pat/m - Treat a newline as a character matched by .
  • /pat/x - Ignore whitespace and comments in the pattern
  • /pat/o - Perform #{} interpolation only once

i, m, and x can also be applied on the subexpression level with the (?on-off) construct, which enables options on, and disables options off for the expression enclosed by the parentheses.



Related Topics



Leave a reply



Submit