Ruby Counting chars in a sequence not using regex
Split string by chars, then group chunks by char, then count chars in chunks:
def word str
str
.chars
.chunk{ |e| e }
.map{|(e,ar)| [e, ar.length] }
end
p word "aaabbcbbaaa"
p word("Why Is Counting Letters Faster Using String#Count Than Using String#Chars in Rubyaa")
p word ""
Result:
[["a", 3], ["b", 2], ["c", 1], ["b", 2], ["a", 3]]
[["a", 10]]
[]
letter count in a string, Ruby
Use upcase
first if you want the letters in uppercase.
Use each_with_object instead of inject. inject
returns the result of the block and you have to explicitly return the hash in the end. each_with_object
automatically returns the initial hash.
string = "Hello hElLo"
hash = string.upcase.scan(/\w/).each_with_object(Hash.new(0)) do |char, hash|
hash[char] += 1
end
puts hash
# => {"H"=>2, "E"=>2, "L"=>4, "O"=>2}
To output individual letters and their count on a line each, iterate the hash:
hash.each do |key, value|
puts "#{key} => #{value}"
end
# H => 2
# E => 2
# L => 4
# O => 2
How to count occurrences of a substring within string fast with Ruby
I think you could approach this problem differently
You do not need to scan the file this many times, you could create a db, like in mongo or mysql, and for each word you find, you fetch the db for it and then adds on some "counter" field.
You could ask me "but then I will have to scan my database a lot and it could take a lot more". Well, sure you wouldn't ask this, but it won't take more time because databases are focused in IO, besides you could always index it.
EDIT: There is no way to delimit at all?? Let's say that where you have the a Word.name string you really holds a (not simple) regex. Could the regex contain the \n? Well, if the regex can contain any value, you should estimate the maximum size of string the regex can fetch, double it, and scan the file by that ammount of chars but moving the cursor by that number.
Lets say your estimate of the maximum your regex could fetch it is like 20 chars nad your file has from 0 to 30000 chars. You pass each regex you have from 0 to 40 chars, then again from 20 to 60, from 40 to 80, etc...
You should also hold the position you found of your smaller regex so it wouldn't repeat it.
Finally, this solution seems to be not worth the effort, your problem may have a greater solution based on what that regexes are, but it will be faster than invoke scan Words.count times your your 300Mb string.
The string count() method
"hello world".count("lo")
returns five. It has matched the third, fourth, fifth, eighth, and tenth characters. Lets call this set one.
"hello world".count("o")
returns two. It has matched the fifth and eighth characters. Lets call this set two.
"hello world".count("lo", "o")
counts the intersection of sets one and two.
The intersection is a third set containing all of the elements of set two that are also in set one. In our example, both sets one and two contain the fifth and eighth characters from the string. That's two characters total. So, count
returns two.
Counting capital letters in Ruby
string.scan()
applies to the entire string, and should work for your use-case. The following should work:
your_string = "Hello World"
capital_count = your_string.scan(/[A-Z]/).length
Related Topics
Using Nokogiri to Split Content on Br Tags
Ruby What Class Gets a Method When There Is No Explicit Receiver
All Possible Combinations of Selected Character Substitution in a String in Ruby
How to Setup a Local Ssl Certificate and a Rails Application
Failed to Build Gem Native Extension - Extconf.Rb Not Found
MAC Osx Lion and Ruby - [Fatal] Failed to Allocate Memory
Is the .Each Iterator in Ruby Guaranteed to Give the Same Order on the Same Elements Every Time
Ruby-Rails Serve Ftp File Direct to Client
Removing or Overriding an Activerecord Validation Added by a Superclass or Mixin
Install Nokogiri 1.6.1 Under Ruby 2.0.0P353 (Rvm Based Installation) Fails (Osx Mavericks)
Generate All "Unique" Subsets of a Set (Not a Powerset)
Converting a Unique Seed String into a Random, Yet Deterministic, Float Value in Ruby
Rails Cannot Load Such File -- MySQL2/Mysql2 (Loaderror)
In Ruby, Why Is a Method Invocation Not Able to Be Treated as a Unit When "Do" and "End" Is Used
Issues Installing Ruby and Rails and Devkit on Windows 7 X64 - Fix Needed