How to Split a String of Repeated Characters with Uneven Amounts? Ruby

How to split a string of repeated characters with uneven amounts? Ruby

You can use a regex with a back reference and the scan() method:

str = "aabbbbccdddeffffgg"
groups = []
str.scan(/((.)\2*)/) { |x| groups.push(x[0]) }

groups will look like this afterwards:

["aa", "bbbb", "cc", "ddd", "e", "ffff", "gg"]

Split string in ruby when character is different to previous character

Here is a one liner using chunk, map and join:

"aabbbc226%%*".chars.chunk(&:itself).map{|_,c| c.join}
# => ["aa", "bbb", "c", "22", "6", "%%", "*"]

Using String#split Method

Updated Answer

Since the earlier answer didn't take care of all the cases as rightly pointed out in the comments, I'm updating the answer with another solution.

This approach separates the valid commas using a separator | and, later uses it to split the string using String#split.

class TokenArrayParser
SPLIT_CHAR = '|'.freeze

def initialize(str)
@str = str
end

def parse
separate_on_valid_comma.split(SPLIT_CHAR)
end

private

def separate_on_valid_comma
dup = @str.dup
paren_count = 0
dup.length.times do |idx|
case dup[idx]
when '(' then paren_count += 1
when ')' then paren_count -= 1
when ',' then dup[idx] = SPLIT_CHAR if paren_count.zero?
end
end

dup
end
end

%w(
id,name,title(first_name,last_name)
id,name,title(first_name,last_name,address(street,pincode(id,code)))
first_name,last_name,address(street,pincode(id,code)),city(name)
a,b(c(d),e,f)
id,name,title(first_name,last_name),pub(name,address)
).each {|str| puts TokenArrayParser.new(str).parse.inspect }

# =>
# ["id", "name", "title(first_name,last_name)"]
# ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
# ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"]
# ["a", "b(c(d),e,f)"]
# ["id", "name", "title(first_name,last_name)", "pub(name,address)"]

I'm sure this can be optimized more.

How to find word with the greatest number of repeated letters

I'd do as below :

s = "aabcc ddeeteefef iijjfff" 
# intermediate calculation that's happening in the final code
s.split(" ").map { |w| w.chars.max_by { |e| w.count(e) } }
# => ["a", "e", "f"] # getting the max count character from each word
s.split(" ").map { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => [2, 5, 3] # getting the max count character's count from each word
# final code
s.split(" ").max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => "ddeeteefef"

update

each_with_object gives better result than group_by method.

require 'benchmark'

s = "aabcc ddeeteefef iijjfff"

def phrogz(s)
s.scan(/\w+/).max_by{ |word| word.chars.group_by(&:to_s).values.map(&:size).max }
end

def arup_v1(s)
max_string = s.split.max_by do |w|
h = w.chars.each_with_object(Hash.new(0)) do |e,hsh|
hsh[e] += 1
end
h.values.max
end
end

def arup_v2(s)
s.split.max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
end

n = 100_000
Benchmark.bm do |x|
x.report("Phrogz:") { n.times {|i| phrogz s } }
x.report("arup_v2:"){ n.times {|i| arup_v2 s } }
x.report("arup_v1:"){ n.times {|i| arup_v1 s } }
end

output

            user     system      total        real
Phrogz: 1.981000 0.000000 1.981000 ( 1.979198)
arup_v2: 0.874000 0.000000 0.874000 ( 0.878088)
arup_v1: 1.684000 0.000000 1.684000 ( 1.685168)

Empty strings at the beginning and end of split

After reading AWK's specification following mu is too short, I came to feel that the original intention for split in AWK was to extract substrings that correspond to fields, each of which is terminated by a punctuation mark like ,, ., and the separator was considered something like an "end of field character". The intention was not splitting a string symmetrically into the left and the right side of each separator position, but was terminating a substring on the left side of a separator position. Under this conception, it makes sense to always have some string (even if it is empty) on the left of the separator, but not necessarily on the right side of the separator. This may have been inherited to Ruby via Perl.

Regular Expression group repeated letters

Just use another capturing group to catch the repeated characters.

s.scan(/((\w)\2*)/).map(&:first)
# => ["aaaaaaa", "", "c"]

How to split the first and the last element on a Ruby string?

Is this what you want to do?

first, *_, last = "now is the time for all".split
first #=> "now"
last #=> "all"

How to extract numbers from string containing numbers+characters into an array in Ruby?

Try using String#scan, like this:

str.scan(/\d+/)
#=> ["123", "84", "3", "98"]

If you want integers instead of strings, just add map to it:

str.scan(/\d+/).map(&:to_i)
#=> [123, 84, 3, 98]

Split string into a list, but keeping the split pattern

Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:

irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]

or:

s.split(/(on)/).each_slice(2).map(&:join)

See below the fold for an explanation.


Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to split, Ruby will include that group in the output:

s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"

Now we want to join each instance of "on" with the preceding string. each_slice(2) helps by passing two elements at a time to its block. Let's just invoke each_slice(2) to see what results. Since each_slice, when invoked without a block, will return an enumerator, we'll apply to_a to the Enumerator so we can see what the Enumerator will enumerator over:

s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]

We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:

b = []
s.split(/(on)/).each_slice(2) do |s|
b << s.join
end
b
# => ["split on", "the word on" "okay?"]

But there's a nifty way to eliminate the temporary b and shorten the code considerably:

s.split(/(on)/).each_slice(2).map do |a|
a.join
end

map passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:

s.split(/(on)/).each_slice(2).map(&:join)

Extract values from a long string

ids, answers = s.scan(/ID:(\d+)_([^_]+)/).transpose

The idea of the regex is:

  1. Ids are preceded by ID: - ID:
  2. The actual ids are numbers - (\d+)
  3. They are separated from the answers with underscore - _
  4. The answers themselves are a sequence of non-underscore characters ([^_]+)

String#scan with return array of pair arrays [id, answer], therefore we transpose it to get two arrays - one with ids and one with answers. Then we use multiple assignment which will unpack the outer array.



Related Topics



Leave a reply



Submit