How to Use One Line Regular Expression to Get Matched Content

how to use one line regular expression to get matched content

You need the Regexp#match method. If you write /\[(.*?)\](.*)/.match('[ruby] regex'), this will return a MatchData object. If we call that object matches, then, among other things:

  • matches[0] returns the whole matched string.
  • matches[n] returns the nth capturing group ($n).
  • matches.to_a returns an array consisting of matches[0] through matches[N].
  • matches.captures returns an array consisting of just the capturing group (matches[1] through matches[N]).
  • matches.pre_match returns everything before the matched string.
  • matches.post_match returns everything after the matched string.

There are more methods, which correspond to other special variables, etc.; you can check MatchData's docs for more. Thus, in this specific case, all you need to write is

tag, keyword = /\[(.*?)\](.*)/.match('[ruby] regex').captures

Edit 1: Alright, for your harder task, you're going to instead want the String#scan method, which @Theo used; however, we're going to use a different regex. The following code should work:

# You could inline the regex, but comments would probably be nice.
tag_and_text = / \[([^\]]*)\] # Match a bracket-delimited tag,
\s* # ignore spaces,
([^\[]*) /x # and match non-tag search text.
input = '[ruby] [regex] [rails] one line [foo] [bar] baz'
tags, texts = input.scan(tag_and_text).transpose

The input.scan(tag_and_text) will return a list of tag–search-text pairs:

[ ["ruby", ""], ["regex", ""], ["rails", "one line "]
, ["foo", ""], ["bar", "baz"] ]

The transpose call flips that, so that you have a pair consisting of a tag list and a search-text list:

[["ruby", "regex", "rails", "foo", "bar"], ["", "", "one line ", "", "baz"]]

You can then do whatever you want with the results. I might suggest, for instance

search_str = texts.join(' ').strip.gsub(/\s+/, ' ')

This will concatenate the search snippets with single spaces, get rid of leading and trailing whitespace, and replace runs of multiple spaces with a single space.

Regular expression to match a line that doesn't contain a word

The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:

^((?!hede).)*$

Non-capturing variant:

^(?:(?!:hede).)*$

The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.

And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s in the following pattern):

/^((?!hede).)*$/s

or use it inline:

/(?s)^((?!hede).)*$/

(where the /.../ are the regex delimiters, i.e., not part of the pattern)

If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [\s\S]:

/^((?!hede)[\s\S])*$/

Explanation

A string is just a list of n characters. Before, and after each character, there's an empty string. So a list of n characters will have n+1 empty strings. Consider the string "ABhedeCD":

    ┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
└──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘

index 0 1 2 3 4 5 6 7

where the e's are the empty strings. The regex (?!hede). looks ahead to see if there's no substring "hede" to be seen, and if that is the case (so something else is seen), then the . (dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.

So, in my example, every empty string is first validated to see if there's no "hede" up ahead, before a character is consumed by the . (dot). The regex (?!hede). will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$

As you can see, the input "ABhedeCD" will fail because on e3, the regex (?!hede) fails (there is "hede" up ahead!).

How to match regex pattern on single line only?

Remove the s or DOTALL flag and change your regex to the following:

^.*?((\yo\b.*?(cut me:)[\s\S]*))

With the DOTALL flag enabled . will match newline characters, so your match can span multiple lines including lines before yo or between yo and cut me. By removing this flag you can ensure that you only match the line with both yo and cut me, and then change the .* at the end to [\s\S]* which will match any character including newlines so that you can match to the end of the string.

http://regex101.com/r/sX2kL0

edit: Note that this takes a slightly different approach than the other answer, this will match the portion of the string that you want deleted so you can replace this portion with an empty string to remove it.

Regex to match only the first line?

that's sounds more like a job for the filehandle buffer.

You should be able to match the first line with:

/^(.*)$/m

(as always, this is PCRE syntax)

the /m modifier makes ^ and $ match embedded newlines. Since there's no /g modifier, it will just process the first occurrence, which is the first line, and then stop.

If you're using a shell, use:

head -n1 file

or as a filter:

commandmakingoutput | head -n1

Please clarify your question, in case this is not wat you're looking for.

how to use regular expression to match strings followed by some keyword and multiple lines

You can use this regex,

^(?s).*keyword1.*?(keyword2 yyyy).*$

Explanation:

  • ^ --> start of string
  • (?s) --> Enables dot to match new lines
  • .* keyword1.*? --> Matches a string that contains keyword1 preceded and succeeded by any characters doing non-greedy match
  • (keyword2 yyyy) --> matches the string of your interest
  • .*$ --> followed by any characters and finally end of input

Demo

How do I match any character across multiple lines in a regular expression?

It depends on the language, but there should be a modifier that you can add to the regex pattern. In PHP it is:

/(.*)<FooBar>/s

The s at the end causes the dot to match all characters including newlines.

Regex match line with string AND without another string

Try: (?=^.*await)(?!^.+ConfigureAwait).+

Explanation:

(?=^.*await) - positive lookahead: assert what is following is: ^ beginning of a line, followed by one or more of any characters due to .+ and a word await, concisely: assert that there is await in a line

(?!^.+ConfigureAwait) - negative lookahead: similairly to above, but negated :) assert that following line doesn't contain ConfigureAwait

.+ - match one ore more of any character (except new line)

Demo



Related Topics



Leave a reply



Submit