Right way to split an std::string into a vectorstring
For space separated strings, then you can do this:
std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
Output:
What
is
the
right
way
to
split
a
string
into
a
vector
of
strings
string that have both comma and space
struct tokens: std::ctype<char>
{
tokens(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
typedef std::ctype<char> cctype;
static const cctype::mask *const_rc= cctype::classic_table();
static cctype::mask rc[cctype::table_size];
std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));
rc[','] = std::ctype_base::space;
rc[' '] = std::ctype_base::space;
return &rc[0];
}
};
std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
Output:
right
way
wrong
way
correct
way
Splitting a vector of strings based on delimiters and storing the result in a structure
This is simple with substr() and find().
#include <iostream>
#include <vector>
#include <string>
using namespace std;
struct Person
{
string m_id;
string m_name;
int m_age;
};
int main()
{
vector<string> data;
vector<Person> people;
data.push_back("id1|Name1|25");
data.push_back("id2|Name2|35");
for(int i(0); i < data.size(); ++i){
size_t idx = data[i].find("|");
string id = data[i].substr( 0, idx);
string name = data[i].substr(idx+1, data[i].find_first_of("|", idx) + idx - 1);
string age = data[i].substr( data[i].find_last_of("|") + 1 );
Person p = {id, name, stoi(age)};
people.push_back(p);
}
for(int i(0); i < people.size(); ++i)
cout << people[i].m_id << " " << people[i].m_name << " " << people[i].m_age << endl;
return 0;
}
and the output is
id1 Name1 25
id2 Name2 35
Split vector into chunks by delimiter
As rawr suggested in the comments i am using the following solution:
foo <- function( x ){
idx <- 1 + cumsum( is.na( x ) )
not.na <- ! is.na( x )
result <- split( x[not.na], idx[not.na] )
return(result)
}
Reasons:
- It was the first solution
- It works
- I understand it
- It does not use any packages/libraries.
Still thanks for all answers!
I will mark this as answered as soo as i can (in two days).
R: split string vector by delimiter and rearrange
This solution generates a boolean matrix with each vector as a row, and each possible character as a column.
possible_options = c('a', 'b', 'c')
result <- sapply(possible_options, function(x) apply(q, 1, function(y) x %in% y))
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
This solution requires a list of all the options. If you don't have that, you can either make a list of all possible options (for example all alphanumeric characters) and then remove blank rows:
result <- sapply(c(letters, LETTERS), function(x) apply(q, 1, function(y) x %in% y))
result <- result[, colSums(result) > 0]
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
Or extract them from the result of q
opts <- as.character(unique(unlist(q)))
opts <- opts[sort.list(opts[opts != ''])]
result <- sapply(opts , function(x) apply(q, 1, function(y) x %in% y))
result
a b c
[1,] TRUE TRUE TRUE
[2,] TRUE FALSE TRUE
[3,] FALSE TRUE TRUE
Split string by delimiter by using vectors - how to split by newline?
You can use getline to read the file line by line, which:
Extracts characters from is and stores them into str until the delimitation character delim is found (or the newline character, '\n' ...) If the delimiter is found, it is extracted and discarded, i.e. it is not stored and the next input operation will begin after it.
Perhaps you are already reading the file through a function that removes line endings.
Splitting character object using vector of delimiters
A tidyverse
solution:
library(tidyverse)
delims <- c("Name", "Age", "Address Please give full address")
df %>%
mutate(rawtext = str_remove_all(rawtext, ":")) %>%
separate(rawtext, c("x", delims), sep = paste(delims, collapse = "|"), convert = T) %>%
mutate(across(where(is.character), str_squish), x = NULL)
# # A tibble: 2 x 3
# Name Age `Address Please give full address`
# <chr> <dbl> <chr>
# 1 John Doe 50 22 Main Street, New York
# 2 Jane Bloggs 42 1 Lower Street, London
Note: convert = T
in separate()
converts Age
from character to numeric ignoring leading/trailing whitespaces.
Parse (split) a string in C++ using string delimiter (standard C++)
You can use the std::string::find()
function to find the position of your string delimiter, then use std::string::substr()
to get a token.
Example:
std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
The
find(const string& str, size_t pos = 0)
function returns the position of the first occurrence ofstr
in the string, ornpos
if the string is not found.The
substr(size_t pos = 0, size_t n = npos)
function returns a substring of the object, starting at positionpos
and of lengthnpos
.
If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());
):
s.erase(0, s.find(delimiter) + delimiter.length());
This way you can easily loop to get each token.
Complete Example
std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";
size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;
Output:
scott
tiger
mushroom
How to split a character vector into data frame?
DF <- data.frame(do.call(rbind, strsplit(a, "-", fixed=TRUE)))
DF[,2] <- as.Date(DF[,2] , format="%Y%m%d")
DF[,3] <- as.integer(gsub(".tsv", "", DF[,3], fixed=TRUE))
# X1 X2 X3
#1 blablabla 1996-01-01 1
#2 blablabla 1996-01-01 2
#3 blablabla 1996-01-01 3
Split elements at a value delimiter in vector R
We can use cumsum
on the logical vector and then do the split
in to list
of vector
s.
lst <- split(v[v!='-'], cumsum(v=="-")[v!='-'])
names(lst) <- paste0("v", seq_along(lst))
If we need it as vector
objects, use list2env
(not recommended though)
list2env(lst, envir = .GlobalEnv)
Or otherwise, we can directly create vector
objects in the global environment
i1 <- v=="-"
i2 <- v!= "-"
grp <- cumsum(i1)
v1 <- v[i2 & grp==0]
v2 <- v[i2 & grp == 1]
Related Topics
Why "Character Is Often Preferred to Factor" in Data.Table for Key
Cant Create File Name with Time Stamp
What Is the Equivalent of Mutate_At (Dplyr) in Data.Table
How to Add a Legend for the Secondary Axis Ggplot
Follow-Up: Generalizing a Data.Frame Subsetting Function 2
In Place Modification of Matrices in R
Change Line Color Depending on Y Value with Ggplot2
Return Call from Ggplot Object
Filter Group of Rows Based on Sum of Values from Different Column
Function/Loop to Replace Na with Values in Adjacent Columns in R
Get First Entries in Rows of List
Dist Function with Large Number of Points
Changes in Plotting an Xts Object
Take the Subsets of a Data.Frame with the Same Feature and Select a Single Row from Each Subset
Adding a New Column to Matrix Error
Downgrade R Version (No Issues with Bioconductor Installation)