Why Does Ruby String#Split Not Treat Consecutive Trailing Delimiters as Separate Entities

Why does Ruby String#split not treat consecutive trailing delimiters as separate entities?

You need to pass a negative value as the second parameter to split. This prevents it from suppressing trailing null fields:

"w$x$$\r\n".chomp.split('$', -1)
# => ["w", "x", "", ""]

See the docs on split.

ruby string split with terminal strings empty

You need to say:

string.split(',',-1)

to avoid omitting the trailing blanks.

per Why does Ruby String#split not treat consecutive trailing delimiters as separate entities?

The second parameter is the "limit" parameter, documented at http://ruby-doc.org/core-2.0.0/String.html#method-i-split as follows:

If the "limit" parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.

Trying to split a sentence into words and delimiters using matchers /\w/ and /\W/

Between start of line /^/ and first occurrance of - there is "First".

So it splits on "First" obtaining an empty string "" and -.

I have a number of questions about this (beginner) example code

Let's begin by creating a csv file named "actsascsv.txt".

IO.write('actsascsv.txt', "Name, Age, IQ\nWilber, 33, 86\nBianca, 18, 143\nBluto, 83, 55")
#=> 58 (characters written)

Let's look at that file.

puts IO.read('actsascsv.csv')
Name, Age, IQ
Wilber, 33, 86
Bianca, 18, 143
Bluto, 83, 55

Now consider each the instance methods of the class ActsAsCSV, where

acts = ActsAsCSV.new
#=> #<ActsAsCSV:0x000058857127ac28 @result=[["Wilber", "33", "86"],
# ["Bianca", "18", "143"], ["Bluto", "83", "55"]], @headers=["Name", "Age", "IQ"]>

def headers
@headers
end

def csv_contents
@result
end

acts.headers
#=> ["Name", "Age", "IQ"]
acts.csv_contents
#=> [["Wilber", "33", "86"], ["Bianca", "18", "143"], ["Bluto", "83", "55"]]

These two (instance) methods are called getters, as they return the values of instance variables. Here the instance variables are @headers and @result. The first of these methods is typical, where the name of the method is the same as the name of the instance variable, without the at character, "@". It's curious that the second was not named result or the instance variables was not @csv_contents.

The first of these methods is normally created using the method Module#attr_reader by writing:

attr_reader :headers

I've covered these two methods first because getters and setters are customarily written at the beginning of a class definition, using one of the three attr_... methods1.

def initialize
@result = []
read
end

initialize (a private instance method) is invoked when the method new is called on the class. Here it initializes the instance variable @result to an empty array and calls the method read. initialize is customarily the first instance method appearing in the class definition.

def read
file = File.new(self.class.to_s.downcase + '.txt')
@headers = file.gets.chomp.split(', ')

file.each do |row|
@result << row.chomp.split(', ')
end
end

This method2,3 initially executes:

file = File.new(self.class.to_s.downcase + '.txt')

The class method File::new takes a single argument, the file name (including the path). Here that is4:

a = self
#=> acts
b = a.class
#=> ActsAsCSV
c = b.to_s
#=> "ActsAsCSV"
d = c.downcase
#=> "actsascsv"
e = d + '.txt'
#=> "actsascsv.txt"
file = File.new(e)
#=> #<File:actsascsv.txt>

Next,

f = file.gets
#=> "Name, Age, IQ\n"
g = f.chomp
#=> "Name, Age, IQ"
@headers = g.split(', ')
#=> ["Name", "Age", "IQ"]

See IO#gets, String#chomp and String#split. Then,

file.each do |row|
@result << row.chomp.split(', ')
end
#=> [["Wilber", "33", "86"], ["Bianca", "18", "143"], ["Bluto", "83", "55"]]
acts.csv_contents
#=> [["Wilber", "33", "86"], ["Bianca", "18", "143"], ["Bluto", "83", "55"]]

See IO#each5.

class RubyCsv < ActsAsCSV
end

RubyCsv.superclass
#= ActsAsCSV

This merely creates a subclass of ActsAsCSV which inherits the latter's constants and methods.

m = RubyCsv.new
#=> Errno::ENOENT (No such file or directory @ rb_sysopen - rubycsv.txt)

As indicated, an exception is raised because there is no file rubycsv.txt.

The conventional way of reading and writing CSV files with Ruby is to use methods of the class CSV.

1 See also Module#attr_writer and Module#attr_accessor.

2 Assuming this method is called only from other of the class' instance methods (here initialize), is generally would be defined as a private method, so it could not be called from outside the class.

3 It is customary, but not required, for CSV file names to have the suffix "csv" (i.e., 'actsascsv.csv'). Moreover, when writing CSV files it is best to avoid adding spaces on either side of the field separators (commas unless otherwise specified).

4 The explicit receiver self is used here. When there is no explicit receiver within instance methods the receiver defaults to self, so it generally is not necessary to include self.. class.to_s.downcase raises an exception, however, as Ruby interprets class as the keyword to create a class. This is one of a handful of situations where self. is required within an instance method.

5 IO class methods are often written with File as the receiver. That is permissible because File is a subclass of IO and therefore inherits the latter's methods.

Why doesn't iterating over each index of a string not function correctly?

By enclosing the range in [..], you have created an Array containing a single Range object. In other words, the length of your Array is one. You just want:

(0..(str.length - 2)).each do |index|
// do something with index
end

Finding a non consecutive element in an array of numbers: Ruby

What does it mean to be "non-consecutive"?

It means that the first number plus one is less than the second number or the difference of the two elements is not one, or …. There are many different ways to express this. So, you can simply search for the first element that satisfies that condition:

arr.each_cons(2).find {|a, b| b - a != 1 }&.last

Split string with delimiters in C

You can use the strtok() function to split a string (and specify the delimiter to use). Note that strtok() will modify the string passed into it. If the original string is required elsewhere make a copy of it and pass the copy to strtok().

EDIT:

Example (note it does not handle consecutive delimiters, "JAN,,,FEB,MAR" for example):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

char** str_split(char* a_str, const char a_delim)
{
char** result = 0;
size_t count = 0;
char* tmp = a_str;
char* last_comma = 0;
char delim[2];
delim[0] = a_delim;
delim[1] = 0;

/* Count how many elements will be extracted. */
while (*tmp)
{
if (a_delim == *tmp)
{
count++;
last_comma = tmp;
}
tmp++;
}

/* Add space for trailing token. */
count += last_comma < (a_str + strlen(a_str) - 1);

/* Add space for terminating null string so caller
knows where the list of returned strings ends. */
count++;

result = malloc(sizeof(char*) * count);

if (result)
{
size_t idx = 0;
char* token = strtok(a_str, delim);

while (token)
{
assert(idx < count);
*(result + idx++) = strdup(token);
token = strtok(0, delim);
}
assert(idx == count - 1);
*(result + idx) = 0;
}

return result;
}

int main()
{
char months[] = "JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC";
char** tokens;

printf("months=[%s]\n\n", months);

tokens = str_split(months, ',');

if (tokens)
{
int i;
for (i = 0; *(tokens + i); i++)
{
printf("month=[%s]\n", *(tokens + i));
free(*(tokens + i));
}
printf("\n");
free(tokens);
}

return 0;
}

Output:

$ ./main.exe
months=[JAN,FEB,MAR,APR,MAY,JUN,JUL,AUG,SEP,OCT,NOV,DEC]

month=[JAN]
month=[FEB]
month=[MAR]
month=[APR]
month=[MAY]
month=[JUN]
month=[JUL]
month=[AUG]
month=[SEP]
month=[OCT]
month=[NOV]
month=[DEC]

asp.net : is it possible to Split large ASP.NET pages into pieces?


<!--#include file="inc_footer.aspx"-->


Related Topics



Leave a reply



Submit