Ruby Regex- Does Gsub Store What It Matches

Ruby Regexp gsub, replace instances of second matching character

You need to capture \w to be able to refer to the submatch.

Use

"this-is-a-string".gsub(/-(\w)/) {$~[1].upcase}
# => thisIsAString

See the Ruby demo

Note that $~[1] inside the {$~[1].upcase} block is actually the text captured with (\w), the $~ is a matchdata object instantiated with gsub and [1] is the index of the first group defined with a pair of unescaped parentheses.

See more details about capturing groups in the Use Parentheses for Grouping and Capturing section at regular-expressions.info.

Ruby using gsub with regex

/\bword\b/ looks for the word word, not for the string defined by the variable word. Just change this regex to /\b#{pattern}\b/ or possibly /\b#{pattern}\b/i (case insensitive) in your code, and your method will work.

This substituor outputs a new array, without changing the original one :

def substitutor(strings,rules)
rules.inject(strings) do |strings, (pattern, replace)|
strings.map do |string|
string.gsub(/\b#{pattern}\b/i, replace)
end
end
end

puts substitutor(array,rules)

# Hey guys, can anyone teach me how 2 b cool?
# I really want 2 b the best @ everything,
# u know what I mean? Tweeting is super fun u guys!!!!
# OMG u guys, u won't believe how sweet my kitten is.
# My kitten is like super cuddly & 2 cute 2 b believed right?
# I'm running out of example tweets 4 u guys, which is weird,
# because I'm a writer & this is just writing & I tweet all day.
# 4 real, u guys. 4 real.
# GUISEEEEE this is so fun! I'm tweeting 4 u guys & this tweet is
# SOOOO long it's gonna b way more than u would think twitter can handle,
# so shorten it up u know what I mean? I just can never tell how long 2 keep typing!
# New game. Middle aged tweet followed by #youngPeopleHashTag Example:
# Gotta get my colonoscopy & mammogram soon. Prevention is key! #swag

How to use ruby gsub Regexp with many matches?

Your regex needs to be a little more bold, in case the quotes occur at the start of the first value, or at the end of the last value:

csv = <<ENDCSV
test,first,line,"you are a "kind" man",thanks
again,second,li,"my "boss" is you",good
more,""Someone" said that you're "cute"",yay
"watch out for this",and,also,"this test case"
ENDCSV

puts csv.gsub(/(?<!^|,)"(?!,|$)/,'""')
#=> test,first,line,"you are a ""kind"" man",thanks
#=> again,second,li,"my ""boss"" is you",good
#=> more,"""Someone"" said that you're ""cute""",yay
#=> "watch out for this",and,also,"this test case"

The above regex is using negative lookbehind and negative lookahead assertions (anchors) available in Ruby 1.9.

  • (?<!^|,) — immediately preceding this spot there must not be either a start of line (^) or a comma
  • " — find a double quote
  • (?!,|$) — immediately following this spot there must not be either a comma or end of line ($)

As a bonus, since you didn't actually capture the characters on either side, you don't need to worry about using \1 correctly in your replacement string.

For more information, see the section "Anchors" in the official Ruby regex documentation.


However, for the case where you do need to replace matches in your output, you can use any of the following:

"hello".gsub /([aeiou])/, '<\1>'            #=> "h<e>ll<o>"
"hello".gsub /([aeiou])/, "<\\1>" #=> "h<e>ll<o>"
"hello".gsub(/([aeiou])/){ |m| "<#{$1}>" } #=> "h<e>ll<o>"

You can't use String interpolation in the replacement string, as you did:

"hello".gsub /([aeiou])/, "<#{$1}>"
#=> "h<previousmatch>ll<previousmatch>"

…because that string interpolation happens once, before the gsub has been run. Using the block form of gsub re-invokes the block for each match, at which point the global $1 has been appropriately populated and is available for use.


Edit: For Ruby 1.8 (why on earth are you using that?) you can use:

puts csv.gsub(/([^,\n\r])"([^,\n\r])/,'\1""\2')

How to use Ruby gsub with regex to do partial string substitution

You may replace the first occurrence of 8 digits inside pipes if a string starts with H using

s = "H||CUSTCHQH2H||PHPCCIPHP|1010032000|28092017|25001853||||"
p s.gsub(/\A(H.*?\|)[0-9]{8}(?=\|)/, '\100000000')
# or
p s.gsub(/\AH.*?\|\K[0-9]{8}(?=\|)/, '00000000')

See the Ruby demo. Here, the value is replaced with 8 zeros.

Pattern details

  • \A - start of string (^ is the start of a line in Ruby)
  • (H.*?\|) - Capturing group 1 (you do not need it when using the variation with \K): H and then any 0+ chars as few as possible
  • \K - match reset operator that discards the text matched so far
  • [0-9]{8} - eight digits
  • (?=\|) - the next char must be |, but it is not added to the match value since it is a positive lookahead that does not consume text.

The \1 in the first gsub is a replacement backreference to the value in Group 1.

How to understand gsub(/^.*\//, '') or the regex

Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.

However, note that String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!.


As to how .* works - the algorithm the regex engine uses is called backtracking. Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:

Step 1: .* matches the entire string abc/def/ghi.rb. Afterwards \/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.

Step 2: .* matches the entire string except the last character - abc/def/ghi.r. Afterwards \/ tries to match a forward slash, but fails (/ != b). .* has to backtrack.

Step 3: .* matches the entire string except the last two characters - abc/def/ghi.. Afterwards \/ tries to match a forward slash, but fails (/ != r). .* has to backtrack.

...
Step n: .* matches abc/def. Afterwards \/ tries to match a forward slash and succeeds. The matching ends here.

Ruby regex - gsub only captured group

You can't. gsub replaces the entire match; it does not do anything with the captured groups. It will not make any difference whether the groups are captured or not.

In order to achieve the result, you need to use lookbehind and lookahead.

"5,214".gsub(/(?<=\d),(?=\d)/, '.')


Related Topics



Leave a reply



Submit