How to Use Ruby Regexp to Substitute String with a "Callback Function"-Like Manipulation

How to use ruby regexp to substitute string with a callback function-like manipulation

a callback block function to the gsub method, probably. I am not sure what you had in mind but could be something like

s.gsub(/^(#+)\w+/) {|m| m.gsub("#", "=") }

Replace each pattern in regexp

To replace each starting comma with a double space, you need to use \G operator, i.e. .gsub(/\G,/, ' '). That operator tells the regex engine to match at the start of the string and then after each successful match. So, you only replace each consecutive comma in the beginning of the string with .gsub(/\G,/, ' ').

Then, you can add other replacements:

s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')

See the IDEONE demo

s = ",,,,C3:,D3,E3,F3,,"
puts s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')

Output:

        C3:  [D3,E3,F3]

How to replace something with n duplicates of itself with regex?

There is no backreference multiplication operators in replacement patterns, nor can you use limiting quantifiers (that are regex pattern constructs, not replacement pattern constructs) like you did to achieve that result.

The only proper "plain regex" way is to repeat $1 thirty times.

In programming languages though, there are occastions when you can pass the match object to a callback function, where you can already use string manipulation provided by the programming language, e.g. in Python:

import re
text = "Hi, Friend!"
print( re.sub(r'Hi', lambda z: z.group() * 30, text) )
# => HiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHiHi, Friend!

Getting regex to output a fixed string as group 1

For the record, some time ago I happened to ask Jan Goyvaerts, who is the author of RegexBuddy and knowledgeable about a myriad regex flavors, whether he happened to know a regex flavor that allows inserting arbitrary text in the replacement—not as a callback or lambda. For instance, a conditional allowing you to insert foo when Group 1 is matched. Or a trick to capture text that isn't there, perhaps with some kind of weird lookaround-fu. He replied that he does not know any flavor allowing this. I believe this is what you were looking for.

I'm sure you're aware that programming languages allow you to manipulate your replacement text in anonymous functions and callbacks... But that's doesn't seem to be what you were after.

How do (*SKIP) or (*F) work on regex?

These two backtracking control verbs are implemented only in Perl, PCRE and the pypi regex module.

The idea of the (*SKIP)(*FAIL) trick is to consume characters that you want to avoid, and that must not be a part of the match result.

A classical pattern that uses of this trick looks like that:

What_I_want_to_avoid(*SKIP)(*FAIL)|What_I_want_to_match

A regex engine processes a string like that:

  • the first token of the pattern is tested on each character from left to right (by default most of the time, but some regex engines can be set to work from right to left, .net can do this if I remember well)

  • if the first token matches, then the regex engine tests the next token of the pattern with the next characters (after the first token match) etc.

  • when a token fails, the regex engine gets the characters matched by the last token back and tries another way to make the pattern succeed (if it doesn't work too, the regex engine do the same with the previous token etc.)

When the regex engine meets the (*SKIP) verb (in this case all previous tokens have obviously succeeded), it has no right anymore to go back to all the previous tokens on the left and has no right anymore to retry all the matched characters with another branch of the pattern or at the next position in the string until the last matched character (included) if the pattern fails later on the right of the (*SKIP) verb.

The role of (*FAIL) is to force the pattern to fail. Thus all the characters matched on the left of (*SKIP) are skipped and the regex engine continues its job after these characters.

The only possibility for the pattern to succeed in the example pattern is that the first branch fails before (*SKIP) to allow the second branch to be tested.

You can find another kind of explanation here.

About Java and other regex engines that don't have these two features

Backtracking control verbs are not implemented in other regex engines and there are no equivalent.

However, you can use several ways to do the same (to be more clear, to avoid something that can be possibly matched by an other part of the pattern).

The use of capture groups:

way 1:

What_I_want_to_avoid|(What_I_want_to_match)

You only need to extract the capture group 1 (or to test if it exists), since it is what you are looking for. If you use the pattern to perform a replacement, you can use the properties of the match result (offset, length, capture group) to make the replacement with classical string functions. Other language like javascript, ruby... allows to use a callback function as replacement.

way 2:

((?>To_avoid|Other_things_that_can_be_before_what_i_want)*)(What_I_want)

It's the more easy way for the replacement, no need to callback function, the replacement string need only to begin with \1 (or $1)

The use of lookarounds:

example, you want to find a word that is not embedded between two other words (lets say S_word and E_word that are different (see Qtax comment)):

(the edge cases S_word E_word word E_word and S_word word S_word E_word are allowed in this example.)

The backtracking control verb way will be:

S_word not_S_word_or_E_word E_word(*SKIP)(*F)|word

To use this way the regex engine needs to allow variable length lookbehinds to a certain extent. With .net or the new regex module, no problems, lookbehinds can have a totally variable length. It is possible with Java too but the size must be limited (example: (?<=.{1,1000})).

The Java equivalent will be:

word(?:(?!not_S_word_or_E_word E_word)|(?<!S_word not_E_word{0,1000} word))

Note that in some cases, only the lookahead is necessary. Note too that starting a pattern with literal character is more efficient than starting with a lookbehind, that's why I putted it after the word (even if I need to rewrite the word one more time in the assertion.)

How do I use a regular expression to select liquid tags within quotation marks, but not necessarily following/followed by quotation marks

I'm guessing that maybe you might be trying to write an expression similar to:

(?=.*")-%|(?=.*")%-

and replace it with %.

Demo

(?=.*") will check for at least a " in the line, which you can change that.

How do I replace specific characters idiomatically in Rust?

You can replace all occurrences of one string within another with str::replace:

let result = str::replace("Hello World!", "!", "?");
// Equivalently:
result = "Hello World!".replace("!", "?");
println!("{}", result); // => "Hello World?"

For more complex cases, you can use regex::Regex::replace_all from regex:

use regex::Regex;
let re = Regex::new(r"[A-Za-z]").unwrap();
let result = re.replace_all("Hello World!", "x");
println!("{}", result); // => "xxxxx xxxxx!"

Editing params nested hash

still keep the output as a params hash (still containing nested hashes arrays

Sure.

You'll have to manipulate the params hash, which is done in the controller.

Whilst I don't have lots of experience with this I just spent a bunch of time testing -- you can use a blend of the ActionController::Parameters class and then using gsub! -- like this:

#app/controllers/your_controller.rb
class YourController < ApplicationController
before_action :set_params, only: :create

def create
# Params are passed from the browser request
@model = Model.new params_hash
end

private

def params_hash
params.require(:x).permit(:y).each do |k,v|
v.gsub!(/[regex]/, 'string')
end
end
end

I tested this on one of our test apps, and it worked perfectly:

Sample Image

Sample Image

Sample Image

--

There are several important points.

Firstly, when you call a strong_params hash, params.permit creates a new hash out of the passed params. This means you can't just modify the passed params with params[:description] = etc. You have to do it to the permitted params.

Secondly, I could only get the .each block working with a bang-operator (gsub!), as this changes the value directly. I'd have to spend more time to work out how to do more elaborate changes.

--

Update

If you wanted to include nested hashes, you'd have to call another loop:

def params_hash
params.require(:x).permit(:y).each do |k,v|
if /_attributes/ ~= k
k.each do |deep_k, deep_v|
deep_v.gsub!(/[regex]/, 'string'
end
else
v.gsub!(/[regex]/, 'string')
end
end
end

MongoDB Regular Expression Search - Starts with using javascript driver and NodeJS

You almost have it. You keep ending up with a regex inside a string and looking for the string '/^94404/' going to find anything unless you have some strange looking zip codes.

The easiest way to build a regex object from a string in JavaScript is to use new RegExp(...):

var query = { Zip: new RegExp('^' + zipCode) };

Then you can:

collection.find(query).toArray(...)

That sort of thing works in the MongoDB shell and similar things work in the Ruby interface so it should work in the JavaScript interface as well.



Related Topics



Leave a reply



Submit