How to split a string of repeated characters with uneven amounts? Ruby
You can use a regex with a back reference and the scan()
method:
str = "aabbbbccdddeffffgg"
groups = []
str.scan(/((.)\2*)/) { |x| groups.push(x[0]) }
groups
will look like this afterwards:
["aa", "bbbb", "cc", "ddd", "e", "ffff", "gg"]
Split string in ruby when character is different to previous character
Here is a one liner using chunk, map and join:
"aabbbc226%%*".chars.chunk(&:itself).map{|_,c| c.join}
# => ["aa", "bbb", "c", "22", "6", "%%", "*"]
Using String#split Method
Updated Answer
Since the earlier answer didn't take care of all the cases as rightly pointed out in the comments, I'm updating the answer with another solution.
This approach separates the valid commas using a separator |
and, later uses it to split the string using String#split
.
class TokenArrayParser
SPLIT_CHAR = '|'.freeze
def initialize(str)
@str = str
end
def parse
separate_on_valid_comma.split(SPLIT_CHAR)
end
private
def separate_on_valid_comma
dup = @str.dup
paren_count = 0
dup.length.times do |idx|
case dup[idx]
when '(' then paren_count += 1
when ')' then paren_count -= 1
when ',' then dup[idx] = SPLIT_CHAR if paren_count.zero?
end
end
dup
end
end
%w(
id,name,title(first_name,last_name)
id,name,title(first_name,last_name,address(street,pincode(id,code)))
first_name,last_name,address(street,pincode(id,code)),city(name)
a,b(c(d),e,f)
id,name,title(first_name,last_name),pub(name,address)
).each {|str| puts TokenArrayParser.new(str).parse.inspect }
# =>
# ["id", "name", "title(first_name,last_name)"]
# ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
# ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"]
# ["a", "b(c(d),e,f)"]
# ["id", "name", "title(first_name,last_name)", "pub(name,address)"]
I'm sure this can be optimized more.
How to find word with the greatest number of repeated letters
I'd do as below :
s = "aabcc ddeeteefef iijjfff"
# intermediate calculation that's happening in the final code
s.split(" ").map { |w| w.chars.max_by { |e| w.count(e) } }
# => ["a", "e", "f"] # getting the max count character from each word
s.split(" ").map { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => [2, 5, 3] # getting the max count character's count from each word
# final code
s.split(" ").max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
# => "ddeeteefef"
update
each_with_object
gives better result than group_by
method.
require 'benchmark'
s = "aabcc ddeeteefef iijjfff"
def phrogz(s)
s.scan(/\w+/).max_by{ |word| word.chars.group_by(&:to_s).values.map(&:size).max }
end
def arup_v1(s)
max_string = s.split.max_by do |w|
h = w.chars.each_with_object(Hash.new(0)) do |e,hsh|
hsh[e] += 1
end
h.values.max
end
end
def arup_v2(s)
s.split.max_by { |w| w.count(w.chars.max_by { |e| w.count(e) }) }
end
n = 100_000
Benchmark.bm do |x|
x.report("Phrogz:") { n.times {|i| phrogz s } }
x.report("arup_v2:"){ n.times {|i| arup_v2 s } }
x.report("arup_v1:"){ n.times {|i| arup_v1 s } }
end
output
user system total real
Phrogz: 1.981000 0.000000 1.981000 ( 1.979198)
arup_v2: 0.874000 0.000000 0.874000 ( 0.878088)
arup_v1: 1.684000 0.000000 1.684000 ( 1.685168)
Empty strings at the beginning and end of split
After reading AWK's specification following mu is too short, I came to feel that the original intention for split
in AWK was to extract substrings that correspond to fields, each of which is terminated by a punctuation mark like ,
, .
, and the separator was considered something like an "end of field character". The intention was not splitting a string symmetrically into the left and the right side of each separator position, but was terminating a substring on the left side of a separator position. Under this conception, it makes sense to always have some string (even if it is empty) on the left of the separator, but not necessarily on the right side of the separator. This may have been inherited to Ruby via Perl.
Regular Expression group repeated letters
Just use another capturing group to catch the repeated characters.
s.scan(/((\w)\2*)/).map(&:first)
# => ["aaaaaaa", "", "c"]
How to split the first and the last element on a Ruby string?
Is this what you want to do?
first, *_, last = "now is the time for all".split
first #=> "now"
last #=> "all"
How to extract numbers from string containing numbers+characters into an array in Ruby?
Try using String#scan
, like this:
str.scan(/\d+/)
#=> ["123", "84", "3", "98"]
If you want integers instead of strings, just add map
to it:
str.scan(/\d+/).map(&:to_i)
#=> [123, 84, 3, 98]
Split string into a list, but keeping the split pattern
Thanks to Mark Wilkins for inpsiration, but here's a shorter bit of code for doing it:
irb(main):015:0> s = "split on the word on okay?"
=> "split on the word on okay?"
irb(main):016:0> b=[]; s.split(/(on)/).each_slice(2) { |s| b << s.join }; b
=> ["split on", " the word on", " okay?"]
or:
s.split(/(on)/).each_slice(2).map(&:join)
See below the fold for an explanation.
Here's how this works. First, we split on "on", but wrap it in parentheses to make it into a match group. When there's a match group in the regular expression passed to split
, Ruby will include that group in the output:
s.split(/(on)/)
# => ["split", "on", "the word", "on", "okay?"
Now we want to join each instance of "on" with the preceding string. each_slice(2)
helps by passing two elements at a time to its block. Let's just invoke each_slice(2)
to see what results. Since each_slice
, when invoked without a block, will return an enumerator, we'll apply to_a
to the Enumerator so we can see what the Enumerator will enumerator over:
s.split(/(on)/).each_slice(2).to_a
# => [["split", "on"], ["the word", "on"], ["okay?"]]
We're getting close. Now all we have to do is join the words together. And that gets us to the full solution above. I'll unwrap it into individual lines to make it easier to follow:
b = []
s.split(/(on)/).each_slice(2) do |s|
b << s.join
end
b
# => ["split on", "the word on" "okay?"]
But there's a nifty way to eliminate the temporary b
and shorten the code considerably:
s.split(/(on)/).each_slice(2).map do |a|
a.join
end
map
passes each element of its input array to the block; the result of the block becomes the new element at that position in the output array. In MRI >= 1.8.7, you can shorten it even more, to the equivalent:
s.split(/(on)/).each_slice(2).map(&:join)
Extract values from a long string
ids, answers = s.scan(/ID:(\d+)_([^_]+)/).transpose
The idea of the regex is:
- Ids are preceded by ID: -
ID:
- The actual ids are numbers -
(\d+)
- They are separated from the answers with underscore -
_
- The answers themselves are a sequence of non-underscore characters
([^_]+)
String#scan
with return array of pair arrays [id, answer]
, therefore we transpose it to get two arrays - one with ids and one with answers. Then we use multiple assignment which will unpack the outer array.
Related Topics
Can't Dup Nilclass on Association Methods
Rvm Isnt Setting Environment with Cron
Flatten a Ruby Array Without Using Built-In 'Flatten' Method
Loop Within Loop in Rails Controller
Encoding::Undefinedconversionerror: "\Xc2" from Ascii-8Bit to Utf-8
How to Efficiently Extract Repeated Elements in a Ruby Array
Openssl Causing Very Slow Rails Boot Time on Windows
Uri::Invalidurierror (Uri Must Be Ascii Only)
How to Pass Value from One Resource to Another Resource in Chef Recipe
How to Read a Clients Windows Login Name Using Ruby on Rails
How to Update a Model's Attribute with a Virtual Attribute
Generating Devise Controllers - Rails Devise
Ruby Daemon Process to Keep Objects Alive for Transient Ruby Instances
How to Access a Class Variable from the Outside in Ruby
Chromedriver Devtools Port Number Error