Is There an Efficient Way to Perform Hundreds of Text Substitutions in Ruby

Is there an efficient way to perform hundreds of text substitutions in Ruby?

An alternative approach, if your input data is separated words, would simply be to build a hash table of {error => correction}.

Hash table lookup is fast, so if you can bend your input data to this format, it will almost certainly be fast enough.

How to replace words inside template placeholders

In your Regex, you have added the \A and \z anchors. These ensure that your regex only matches, if the string only contains exactly <%= Name %> with nothing before or after.

To match the your pattern anywhere in the string, you can simply remove the anchors:

parsed_body = body.gsub(/<%= Name %>/, "Some person")

Comibine conditions in Ruby

yes there is :

if %w(new create).include? a
#code here
else
#code
end

How to match a string in array, regardless of the string size in Ruby

Here's where I'd start with this sort of task; These are great building blocks for human-interfaces on the web or in applications:

require 'regexp_trie'

saxophone_section = ["alto 1", "alto 2", "tenor 1", "tenor 2", "bari sax"]
RegexpTrie.union saxophone_section # => /(?:alto\ [12]|tenor\ [12]|bari\ sax)/

The output of RegexpTrie.union is a pattern that will match all of the strings in saxophone_section. The pattern is concise and efficient, and best of all, doesn't have to be generated by hand.

Applying that pattern to the string being created will show if you have a hit when there's a match, but only when there's enough of the string to match.

That's where a regular Trie is very useful. When you're trying to find what possible hits you could have, prior to having a full match, a Trie can find all the possibilities:

require 'trie'

trie = Trie.new
saxophone_section = ["alto 1", "alto 2", "tenor 1", "tenor 2", "bari sax"]

saxophone_section.each { |w| trie.add(w) }
trie.children('a') # => ["alto 1", "alto 2"]
trie.children('alto') # => ["alto 1", "alto 2"]
trie.children('alto 2') # => ["alto 2"]
trie.children('bari') # => ["bari sax"]

Blend those together and see what you come up with.

Remove excess junk words from string or array of strings

Dealing with stopwords is easy, but I'd suggest you do it BEFORE you split the string into the component words.

Building a fairly simple regular expression can make short work of the words:

STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i

clean_string = 'to into and sandbar or forest the thesis a algebra'.gsub(STOPWORDS, '')
# => " into sandbar forest thesis algebra"

clean_string.split
# => ["into", "sandbar", "forest", "thesis", "algebra"]

How do you handle them if you get them already split? I'd join(' ') the array to turn it back into a string, then run the above code, which returns the array again.

incoming_array = [
"14000",
"Things",
"to",
"Be",
"Happy",
"About",
]

STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i

incoming_array = incoming_array.join(' ').gsub(STOPWORDS, '').split
# => ["14000", "Things", "Be", "Happy", "About"]

You could try to use Array's set operations, but you'll run afoul of the case sensitivity of the words, forcing you to iterate over the stopwords and the arrays which will run a LOT slower.

Take a look at these two answers for some added tips on how you can build very powerful patterns making it easy to match thousands of strings:

  • "How do I ignore file types in a web crawler?"
  • "Is there an efficient way to perform hundreds of text substitutions in Ruby?"

How do I write a regular expression that will match characters in any order?

Here is your solution

^(?:([act])(?!.*\1)){3}$

See it here on Regexr

^                  # matches the start of the string
(?: # open a non capturing group
([act]) # The characters that are allowed and a capturing group
(?!.*\1) # That character is matched only if it does not occur once more, Lookahead assertion
){3} # Defines the amount of characters
$

The only special think is the lookahead assertion, to ensure the character is not repeated.

^ and $ are anchors to match the start and the end of the string.



Related Topics



Leave a reply



Submit