Split Ruby Regex Over Multiple Lines

Split Ruby regex over multiple lines

You need to use the /x modifier, which enables free-spacing mode.

In your case:

"bar" =~ /(foo|
bar)/x

How can I break a regex into multiple lines?

The /x freespace modifier allows you to have regex on multiple lines, e.g.

address !~ /^Recipient Name:(.*)$\n
^Address Line 1:(.*)$\n
(^Address Line 2:(.*)$\n)?
^City:(.*)$\n
^State:(.*)$\n
^ZIP Code:(.*)$/x

Regex, how to match multiple lines?

You can use the /m modifier to enable multiline mode (i.e. to allow . to match newlines), and you can use ? to perform non-greedy matching:

message = <<-MSG
Random Line 1
Random Line 2
From: person@example.com
Date: 01-01-2011
To: friend@example.com
Subject: This is the subject line
Random Line 3
Random Line 4
MSG

message.match(/(From:.*Subject.*?)\n/m)[1]
=> "From: person@example.com\nDate: 01-01-2011\nTo: friend@example.com\nSubject: This is the subject line"

See http://ruby-doc.org/core/Regexp.html and search for "multiline mode" and "greedy by default".

ruby regex, splitting to mutiple lines

I can think of three ways to make your code more readable. Use:

  1. The /x modifier and add comments with #.
  2. Inline comments with the (?#comment_here) modifier.
  3. Named groups; For example: (?<year>\d{2,4}) is useful for backreferencing or manipulating values afterwards.

More information:
http://www.ruby-doc.org/core-1.9.3/Regexp.html

Ruby - Regex Multiple lines

Check this example:

string = <<EOF
#++
## app_name/dir/dir/filename
## $Id$
##--

foo bar
EOF

puts /#\+\+.*\n##.*\n##.*\n##--/.match(string)

The pattern matches two lines starting with ## between two lines starting with #++ and ending with #-- plus including those boundaries into the match. If I got the question right, this should be what you want.

You can generalize the pattern to match everything between the first #++ and the first #-- (including them) using the following pattern:

puts /#\+\+.*?##--/m.match(string)

How do I match any character across multiple lines in a regular expression?

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

How to match multi-line strings in Ruby using Regular Expressions to be used in an Inverted Index?

You are trying to read the file line by line. In such a case /m multiline modifier makes no sense. You are to read the entire file and then parse it for whatever you want:

content = File.read('test.txt')
content.scan(/\.T(.*?)\.B/m) { |mtch|
puts mtch
}

UPD
To put the scan results to hash as in the example you need either flatten method of an array:

content = File.read('test.txt')
# flatten the array ⇓⇓⇓⇓⇓⇓⇓
words = content.scan(/\.T(.*?)\.B/m).flatten
words.each …

or block within scan method:

content = File.read('test.txt')
freqs = {}
content.scan(/\.T(.*?)\.B/m) { |mtch|
(freqs[mtch] ||= 0) += 1
}

UPD2 To split the resulting array of sentenses to array of words:

arr = ["Preliminary Report International", "Fingers or Fists"]   
arr.map {|e| e.split(' ')}.flatten.map(&:downcase)
# ⇒ ["preliminary", "report", "international", "fingers", "or", "fists"]

Here first map iterates array elements and transforms them to arrays of splitten words, flatten produces plain array from yielded array of arrays, and, finally, downcase is here because you’ve requested the downcased words in your example.

Hope it helps.

Ruby String split with regex

I think this would do it:

a.split(/\.(?=[\w])/)

I don't know how much you know about regex, but the (?=[\w]) is a lookahead that says "only match the dot if the next character is a letter kind of character". A lookahead won't actually grab the text it matches. It just "looks". So the result is exactly what you're looking for:

> a.split(/\.(?=[\w])/)
=> ["foo", "bar", "size", "split('.')", "last"]

split one line regex in a multiline regexp in perl

What the x flag does is very simply say 'ignore whitespace'.

So you no longer match 'space' characters , and instead have to use \s or similar.

So you can write:

if ( m/
^
\d+\s+
fish:\w+\s+
$
/x ) {
print "Matched\n";
}

You can test regular expressions with various websites but one example is https://regex101.com/

So to take your example: https://regex101.com/r/eG5jY8/1

But how is yours not working?

This matches:

my $string = q{* Code "l;k""dfsakd;.*[])_lkaDald"};

my $firstRegexpr = qr/^\s*
\*
\s*
Code\s+
\"
(?<Code>((\")*[^\"]+)+)
\"
/x;

print "Compiled_Regex: $firstRegexpr\n";
print "Matched\n" if ( $string =~ m/$firstRegexpr/ );

And as for not having $] - there's two answers. Either: Use \ to escape it, or use \Q\E.

Regex multiline match

For a whole-string regex, you can use

regex = /\A(\w+:\d+\s*\n?)+\z/

A simple solution

!!string.match(regex)

The !! forces the answer to a boolean true or false.

Implementing your pseudocode exactly

You can monkey-patch the String class to add your test method:

class String
def test(regex)
!!self.match(regex)
end
end

Output:

valid_example_string.test(regex)    # => true
invalid_example_string.test(regex) # => false


Related Topics



Leave a reply



Submit