How to Extract a Single Character (As a String) from a Larger String in Ruby

How to extract a single character (as a string) from a larger string in Ruby?

In Ruby 1.9, it's easy. In Ruby 1.9, Strings are encoding-aware sequences of characters, so you can just index into it and you will get a single-character string out of it:

'µsec'[0] => 'µ'

However, in Ruby 1.8, Strings are sequences of bytes and thus completely unaware of the encoding. If you index into a string and that string uses a multibyte encoding, you risk indexing right into the middle of a multibyte character (in this example, the 'µ' is encoded in UTF-8):

'µsec'[0] # => 194
'µsec'[0].chr # => Garbage
'µsec'[0,1] # => Garbage

However, Regexps and some specialized string methods support at least a small subset of popular encodings, among them some Japanese encodings (e.g. Shift-JIS) and (in this example) UTF-8:

'µsec'.split('')[0] # => 'µ'
'µsec'.split(//u)[0] # => 'µ'

How to extract a text from a large string and change it

You can try this out:

---(?:[\n\r]|.)*?(?<=title: )([^\n\r]+)(?:[\n\r]|.)*?---

As demonstrated here: https://regex101.com/r/9O99Fz/1/

Explanation -

(?:[\n\r]|.)*? - after matching '---', the regex matches all characters until the next condition in the regex:

(?<=title: ) - this is a positive lookbehind that tells the regex to match the text which is preceded by title:
([^\n\r]+) - since the title will be one sentence, this group matches the actual title you want by saying that it should not have a newline or carriage-return (this is the capturing group 1)

(?:[\n\r]|.)*?--- just matches the last part of the 'details' section


Also, in the substitution part, \1 is replaced by the title in the capturing group 1, and so the code should execute correctly :)

Select all characters in a string until a specific character Ruby

You can avoid creating an unnecessary Array (like Array#split) or using a Regex (like Array#gsub) by using.

a = "2.452811139617034,42.10874821716908|3.132087902867818,42.028314077306646|-0.07934861041448178,41.647538468746916|-0.07948265046522918,41.64754863599606"

a[0,a.index('|')]
#=>"2.452811139617034,42.1087482171"

This means select characters at positions 0 up to the index of the first pipe (|). Technically speaking it is start at position 0 and select the length of n where n is the index of the pipe character which works in this case because ruby uses 0 based indexing.

As @CarySwoveland astutely pointed out the string may not contain a pipe in which case my solution would need to change to

#to return entire string
a[0,a.index('|') || a.size]
# or
b = a.index(?|) ? a[0,b] : a
# or to return empty string
a[0,a.index('|').to_i]
# or to return nil
a[0,a.index(?|) || -1]

Ruby - How to select some characters from string

Try foo[0...100], any range will do. Ranges can also go negative. It is well explained in the documentation of Ruby.

Extracting the last n characters from a ruby string

Here you have a one liner, you can put a number greater than the size of the string:

"123".split(//).last(5).to_s

For ruby 1.9+

"123".split(//).last(5).join("").to_s

For ruby 2.0+, join returns a string

"123".split(//).last(5).join

Extract first line from a (possibly multiline) string

You can use s.split("\n", 2)[0].
This splits the string at each newline and then takes the first element of the array. We also use the limit parameter so it only splits once.

How to extract string from large file only if specific string appears previous using Ruby?

I think this may be what you are looking for, but if not, let me know and I will change it. Look especially at the very end to see if that is the sort of output (for input having two records, both with a "MH" field) you want. I will also add a "explanation" section at the end once I have understood your question correctly.

I have assumed that each record begins

*NEW_RECORD

and you wish to identify all lines beginning "MH" whose field is one of the elements of:

candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]

and for each match, you would like to print the contents of the lines for the same record that begin with "FX", "AN" and "MS".

Code

NEW_RECORD_MARKER = "*NEW RECORD"

def getem(fname, candidate_descriptor_keys)
line = 0
found_mh = false
File.open(fname).each do |file_line|
file_line = file_line.strip
case
when file_line == NEW_RECORD_MARKER
puts # space between records
found_mh = false
when found_mh == false
candidate_descriptor_keys.each do |cand_term|
if file_line =~ /^MH\s=\s(#{cand_term})$/
found_mh = true
puts "MH from line #{line} of file is: #{cand_term}"
break
end
end
when found_mh
["FX", "AN", "MS"].each do |des|
if file_line =~ /^#{des}\s=\s(.*)$/
see_also = $1
puts " Line #{line} of file is: #{des}: #{see_also}"
end
end
end
line += 1
end
end

Example

Let's begin be creating a file, starging with a "here document that contains two records":

records =<<_
*NEW RECORD
RECTYPE = D
MH = Informed Consent
AQ = ES HI LJ PX SN ST
ENTRY = Consent, Informed
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = Disclosure
FX = Mental Competency
FX = Therapeutic Misconception
FX = Treatment Refusal
ST = T058
ST = T078
AN = competency to consent
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Voluntary authorization
*NEW RECORD
MH = Obesity
AQ = ES HI LJ PX SN ST
ENTRY = Obesity
MN = N03.706.437.650.312
MN = N03.706.535.489
FX = 1st FX
FX = 2nd FX
AN = Only AN
PI = Jurisprudence (1966-1970)
PI = Physician-Patient Relations (1966-1970)
MS = Only MS
_

If you puts records you will see it is just a string. (You'll see that I shortened two of them.) Now write it to a file:

File.write('mesh_descriptor', records)

If you wish to confirm the file contents, you could do this:

puts File.read('mesh_descriptor')

We also need to define define the array candidate_descriptor_keys:

candidate_descriptor_keys =
["Body Weight", "Obesity", "Thinness", "Informed Consent"]

We can now execute the method getem:

getem('mesh_descriptor', candidate_descriptor_keys)

MH from line 2 of file is: Informed Consent
Line 7 of file is: FX: Disclosure
Line 8 of file is: FX: Mental Competency
Line 9 of file is: FX: Therapeutic Misconception
Line 10 of file is: FX: Treatment Refusal
Line 13 of file is: AN: competency to consent
Line 16 of file is: MS: Voluntary authorization

MH from line 18 of file is: Obesity
Line 23 of file is: FX: 1st FX
Line 24 of file is: FX: 2nd FX
Line 25 of file is: AN: Only AN
Line 28 of file is: MS: Only MS

Extract data from one big string with regex

# -*- coding: utf-8 -*-
string = "A — N° 1 2 janvier 2013

TABLE OF CONTENT

Topic à one ......... 30 Second Topic .......... 33
Third - one ......... 3 Topic.with.dots .......... 33
One more line ......................... 27 last topic ...... 34"
puts string.scan(/(\p{l}[\p{l} \.-]*)\s+\.+\s+\d+/i).flatten

This does what you want. It also matches single letter titles.



Related Topics



Leave a reply



Submit