Why Is Counting Letters Faster Using String#Count Than Using String#Chars in Ruby

Ruby Counting chars in a sequence not using regex

Split string by chars, then group chunks by char, then count chars in chunks:

def word str
  str
  .chars
  .chunk{ |e| e }
  .map{|(e,ar)| [e, ar.length] }
end

p word "aaabbcbbaaa"
p word("Why Is Counting Letters Faster Using String#Count Than Using String#Chars in Rubyaa")
p word ""

Result:

[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[["a", 10]]
[]

letter count in a string, Ruby

Use upcase first if you want the letters in uppercase.

Use each_with_object instead of inject. inject returns the result of the block and you have to explicitly return the hash in the end. each_with_object automatically returns the initial hash.

string = "Hello hElLo"
hash = string.upcase.scan(/\w/).each_with_object(Hash.new(0)) do |char, hash|
  hash[char] += 1
end
puts hash
# => {"H"=>2, "E"=>2, "L"=>4, "O"=>2}

To output individual letters and their count on a line each, iterate the hash:

hash.each do |key, value|
  puts "#{key} => #{value}"
end

# H => 2
# E => 2
# L => 4
# O => 2

How to count occurrences of a substring within string fast with Ruby

I think you could approach this problem differently

You do not need to scan the file this many times, you could create a db, like in mongo or mysql, and for each word you find, you fetch the db for it and then adds on some "counter" field.

You could ask me "but then I will have to scan my database a lot and it could take a lot more". Well, sure you wouldn't ask this, but it won't take more time because databases are focused in IO, besides you could always index it.

EDIT: There is no way to delimit at all?? Let's say that where you have the a Word.name string you really holds a (not simple) regex. Could the regex contain the \n? Well, if the regex can contain any value, you should estimate the maximum size of string the regex can fetch, double it, and scan the file by that ammount of chars but moving the cursor by that number.

Lets say your estimate of the maximum your regex could fetch it is like 20 chars nad your file has from 0 to 30000 chars. You pass each regex you have from 0 to 40 chars, then again from 20 to 60, from 40 to 80, etc...

You should also hold the position you found of your smaller regex so it wouldn't repeat it.

Finally, this solution seems to be not worth the effort, your problem may have a greater solution based on what that regexes are, but it will be faster than invoke scan Words.count times your your 300Mb string.

The string count() method

"hello world".count("lo") returns five. It has matched the third, fourth, fifth, eighth, and tenth characters. Lets call this set one.

"hello world".count("o") returns two. It has matched the fifth and eighth characters. Lets call this set two.

"hello world".count("lo", "o") counts the intersection of sets one and two.

The intersection is a third set containing all of the elements of set two that are also in set one. In our example, both sets one and two contain the fifth and eighth characters from the string. That's two characters total. So, count returns two.

Counting capital letters in Ruby

string.scan() applies to the entire string, and should work for your use-case. The following should work:

your_string = "Hello World"
capital_count = your_string.scan(/[A-Z]/).length