How to extract a single character (as a string) from a larger string in Ruby?
In Ruby 1.9, it's easy. In Ruby 1.9, Strings are encoding-aware sequences of characters, so you can just index into it and you will get a single-character string out of it:
'µsec'[0] => 'µ'
However, in Ruby 1.8, Strings are sequences of bytes and thus completely unaware of the encoding. If you index into a string and that string uses a multibyte encoding, you risk indexing right into the middle of a multibyte character (in this example, the 'µ' is encoded in UTF-8):
'µsec'[0] # => 194
'µsec'[0].chr # => Garbage
'µsec'[0,1] # => Garbage
However, Regexps and some specialized string methods support at least a small subset of popular encodings, among them some Japanese encodings (e.g. Shift-JIS) and (in this example) UTF-8:
'µsec'.split('')[0] # => 'µ'
'µsec'.split(//u)[0] # => 'µ'
How to extract a text from a large string and change it
You can try this out:
---(?:[\n\r]|.)*?(?<=title: )([^\n\r]+)(?:[\n\r]|.)*?---
As demonstrated here: https://regex101.com/r/9O99Fz/1/
Explanation -
(?:[\n\r]|.)*?
- after matching '---', the regex matches all characters until the next condition in the regex:(?<=title: )
- this is a positive lookbehind that tells the regex to match the text which is preceded by title:
([^\n\r]+)
- since the title will be one sentence, this group matches the actual title you want by saying that it should not have a newline or carriage-return (this is the capturing group 1)(?:[\n\r]|.)*?---
just matches the last part of the 'details' section
Also, in the substitution part,
\1
is replaced by the title in the capturing group 1, and so the code should execute correctly :) Select all characters in a string until a specific character Ruby
You can avoid creating an unnecessary Array (like Array#split
) or using a Regex (like Array#gsub
) by using.
a = "2.452811139617034,42.10874821716908|3.132087902867818,42.028314077306646|-0.07934861041448178,41.647538468746916|-0.07948265046522918,41.64754863599606"
a[0,a.index('|')]
#=>"2.452811139617034,42.1087482171"
This means select characters at positions 0 up to the index of the first pipe (|
). Technically speaking it is start at position 0 and select the length of n where n is the index of the pipe character which works in this case because ruby uses 0 based indexing.
As @CarySwoveland astutely pointed out the string may not contain a pipe in which case my solution would need to change to
#to return entire string
a[0,a.index('|') || a.size]
# or
b = a.index(?|) ? a[0,b] : a
# or to return empty string
a[0,a.index('|').to_i]
# or to return nil
a[0,a.index(?|) || -1]
Ruby - How to select some characters from string
Try foo[0...100]
, any range will do. Ranges can also go negative. It is well explained in the documentation of Ruby.
Extracting the last n characters from a ruby string
Here you have a one liner, you can put a number greater than the size of the string:
"123".split(//).last(5).to_s
For ruby 1.9+
"123".split(//).last(5).join("").to_s
For ruby 2.0+, join returns a string
"123".split(//).last(5).join
Extract first line from a (possibly multiline) string
You can use s.split("\n", 2)[0]
.
This splits the string at each newline and then takes the first element of the array. We also use the limit parameter so it only splits once.
How to extract string from large file only if specific string appears previous using Ruby?
I think this may be what you are looking for, but if not, let me know and I will change it. Look especially at the very end to see if that is the sort of output (for input having two records, both with a "MH" field) you want. I will also add a "explanation" section at the end once I have understood your question correctly.
I have assumed that each record begins
*NEW_RECORD
and you wish to identify all lines beginning "MH"
whose field is one of the elements of:
candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]
and for each match, you would like to print the contents of the lines for the same record that begin with "FX"
, "AN"
and "MS"
.
Code
NEW_RECORD_MARKER = "*NEW RECORD"
def getem(fname, candidate_descriptor_keys)
line = 0
found_mh = false
File.open(fname).each do |file_line|
file_line = file_line.strip
case
when file_line == NEW_RECORD_MARKER
puts # space between records
found_mh = false
when found_mh == false
candidate_descriptor_keys.each do |cand_term|
if file_line =~ /^MH\s=\s(#{cand_term})$/
found_mh = true
puts "MH from line #{line} of file is: #{cand_term}"
break
end
end
when found_mh
["FX", "AN", "MS"].each do |des|
if file_line =~ /^#{des}\s=\s(.*)$/
see_also = $1
puts " Line #{line} of file is: #{des}: #{see_also}"
end
end
end
line += 1
end
end
Example
Let's begin be creating a file, starging with a "here document that contains two records":
records =<<_
*NEW RECORD
RECTYPE = D
MH = Informed Consent
AQ = ES HI LJ PX SN ST
ENTRY = Consent, Informed
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = Disclosure
FX = Mental Competency
FX = Therapeutic Misconception
FX = Treatment Refusal
ST = T058
ST = T078
AN = competency to consent
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Voluntary authorization
*NEW RECORD
MH = Obesity
AQ = ES HI LJ PX SN ST
ENTRY = Obesity
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = 1st FX
FX = 2nd FX
AN = Only AN
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Only MS
_
If you puts records
you will see it is just a string. (You'll see that I shortened two of them.) Now write it to a file:
File.write('mesh_descriptor', records)
If you wish to confirm the file contents, you could do this:
puts File.read('mesh_descriptor')
We also need to define define the array candidate_descriptor_keys
:
candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]
We can now execute the method getem
:
getem('mesh_descriptor', candidate_descriptor_keys)
MH from line 2 of file is: Informed Consent
Line 7 of file is: FX: Disclosure
Line 8 of file is: FX: Mental Competency
Line 9 of file is: FX: Therapeutic Misconception
Line 10 of file is: FX: Treatment Refusal
Line 13 of file is: AN: competency to consent
Line 16 of file is: MS: Voluntary authorization
MH from line 18 of file is: Obesity
Line 23 of file is: FX: 1st FX
Line 24 of file is: FX: 2nd FX
Line 25 of file is: AN: Only AN
Line 28 of file is: MS: Only MS
Extract data from one big string with regex
# -*- coding: utf-8 -*-
string = "A — N° 1 2 janvier 2013
TABLE OF CONTENT
Topic à one ......... 30 Second Topic .......... 33
Third - one ......... 3 Topic.with.dots .......... 33
One more line ......................... 27 last topic ...... 34"
puts string.scan(/(\p{l}[\p{l} \.-]*)\s+\.+\s+\d+/i).flatten
This does what you want. It also matches single letter titles.
Related Topics
How to Convert Ppt to Images in Ruby
How to Specify the Location of the Chromedriver Binary
Storing Passwords for External APIs - Best Practice
Does Ruby Have Syntax for Safe Navigation Operator of Nil Values, Like in Groovy
/Config/Initializers/Secret_Token.Rb Not Being Generated. Why Not
Using Phonegap as a Native Container for a Rails 3 App
Ruby: Converting from Float to Integer in Ruby Produces Strange Results
Rbenv Install Ruby Build Failed
Sinatra Does Not Start with Twitter Gem
Async Requests Using Sinatra Streaming API
Rails Server Cannot Start; Getaddrinfo: Nodename Nor Servname Provided, or Not Known (Socketerror)
How to Store an Instance Variable Across Multiple Actions in a Controller
How to Execute Windows Cli Commands in Ruby
Rails 4.1 Activerecord::Relation Is No More Like Array
Is It Ever Necessary to Use 'Chomp' Before Using 'To_I' or 'To_F'
R Statistical Package Gem for a Rails Application