How to Split a Vector by Delimiter

Right way to split an std::string into a vectorstring

For space separated strings, then you can do this:

std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

What
is
the
right
way
to
split
a
string
into
a
vector
of
strings


string that have both comma and space

struct tokens: std::ctype<char> 
{
tokens(): std::ctype<char>(get_table()) {}

static std::ctype_base::mask const* get_table()
{
typedef std::ctype<char> cctype;
static const cctype::mask *const_rc= cctype::classic_table();

static cctype::mask rc[cctype::table_size];
std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));

rc[','] = std::ctype_base::space;
rc[' '] = std::ctype_base::space;
return &rc[0];
}
};

std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

right
way
wrong
way
correct
way

Splitting a vector of strings based on delimiters and storing the result in a structure

This is simple with substr() and find().

#include <iostream>
#include <vector>
#include <string>

using namespace std;

struct Person
{
string m_id;
string m_name;
int m_age;
};

int main()
{
vector<string> data;
vector<Person> people;

data.push_back("id1|Name1|25");
data.push_back("id2|Name2|35");

for(int i(0); i < data.size(); ++i){
size_t idx = data[i].find("|");
string id = data[i].substr( 0, idx);
string name = data[i].substr(idx+1, data[i].find_first_of("|", idx) + idx - 1);
string age = data[i].substr( data[i].find_last_of("|") + 1 );

Person p = {id, name, stoi(age)};
people.push_back(p);

}

for(int i(0); i < people.size(); ++i)
cout << people[i].m_id << " " << people[i].m_name << " " << people[i].m_age << endl;

return 0;
}

and the output is

id1  Name1  25
id2 Name2 35

Split vector into chunks by delimiter

As rawr suggested in the comments i am using the following solution:

foo <- function( x ){
idx <- 1 + cumsum( is.na( x ) )
not.na <- ! is.na( x )
result <- split( x[not.na], idx[not.na] )
return(result)
}

Reasons:

  • It was the first solution
  • It works
  • I understand it
  • It does not use any packages/libraries.

Still thanks for all answers!

I will mark this as answered as soo as i can (in two days).

R: split string vector by delimiter and rearrange

This solution generates a boolean matrix with each vector as a row, and each possible character as a column.

possible_options = c('a', 'b', 'c')
result <- sapply(possible_options, function(x) apply(q, 1, function(y) x %in% y))
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE

This solution requires a list of all the options. If you don't have that, you can either make a list of all possible options (for example all alphanumeric characters) and then remove blank rows:

result <- sapply(c(letters, LETTERS), function(x) apply(q, 1, function(y) x %in% y))
result <- result[, colSums(result) > 0]
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE

Or extract them from the result of q

opts <- as.character(unique(unlist(q)))
opts <- opts[sort.list(opts[opts != ''])]
result <- sapply(opts , function(x) apply(q, 1, function(y) x %in% y))
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE

Split string by delimiter by using vectors - how to split by newline?

You can use getline to read the file line by line, which:

Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n' ...) If the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.

Perhaps you are already reading the file through a function that removes line endings.

Splitting character object using vector of delimiters

A tidyverse solution:

library(tidyverse)
delims <- c("Name", "Age", "Address Please give full address")

df %>%
mutate(rawtext = str_remove_all(rawtext, ":")) %>%
separate(rawtext, c("x", delims), sep = paste(delims, collapse = "|"), convert = T) %>%
mutate(across(where(is.character), str_squish), x = NULL)

# # A tibble: 2 x 3
# Name Age `Address Please give full address`
# <chr> <dbl> <chr>
# 1 John Doe 50 22 Main Street, New York
# 2 Jane Bloggs 42 1 Lower Street, London

Note: convert = T in separate() converts Age from character to numeric ignoring leading/trailing whitespaces.

Parse (split) a string in C++ using string delimiter (standard C++)

You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

Example:

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
  • The find(const string& str, size_t pos = 0) function returns the position of the first occurrence of str in the string, or npos if the string is not found.

  • The substr(size_t pos = 0, size_t n = npos) function returns a substring of the object, starting at position pos and of length npos.


If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

Complete Example

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

scott
tiger
mushroom

How to split a character vector into data frame?

DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))

# X1 X2 X3
#1 blablabla 1996-01-01 1
#2 blablabla 1996-01-01 2
#3 blablabla 1996-01-01 3

Split elements at a value delimiter in vector R

We can use cumsum on the logical vector and then do the split in to list of vectors.

lst <- split(v[v!='-'], cumsum(v=="-")[v!='-'])
names(lst) <- paste0("v", seq_along(lst))

If we need it as vector objects, use list2env (not recommended though)

list2env(lst, envir = .GlobalEnv)

Or otherwise, we can directly create vector objects in the global environment

i1 <- v=="-"
i2 <- v!= "-"
grp <- cumsum(i1)
v1 <- v[i2 & grp==0]
v2 <- v[i2 & grp == 1]


Related Topics



Leave a reply



Submit