Parsing into Several Vector Members

parsing into several vector members

There are several ways :)

  1. Custom attribute traits
  2. The same using semantic actions
  3. Everything in semantic actions, at detail level

1. Custom attribute traits

The cleanest, IMO would to replace the Fusion Sequence Adaptation (BOOST_FUSION_ADAPT_STRUCT) by custom container attribute traits for Spirit:

namespace boost { namespace spirit { namespace traits {

template<>
struct is_container<ElemParseData, void> : mpl::true_ { };
template<>
struct container_value<ElemParseData, void> {
typedef boost::variant<float, unsigned int> type;
};
template <>
struct push_back_container<ElemParseData, std::vector<float>, void> {
static bool call(ElemParseData& c, std::vector<float> const& val) {
c.verts.insert(c.verts.end(), val.begin(), val.end());
return true;
}
};
template <>
struct push_back_container<ElemParseData, std::vector<unsigned int>, void> {
static bool call(ElemParseData& c, std::vector<unsigned int> const& val) {
c.idx.insert(c.idx.end(), val.begin(), val.end());
return true;
}
};
}}}

Without changes to the grammar, this will simply result in the same effect. However, now you can modify the parser to expect the desired grammar:

    vertex   = 'v' >> qi::double_ >> qi::double_ >> qi::double_;
elements = 'f' >> qi::int_ >> qi::int_ >> qi::int_;

start = *(vertex | elements);

And because of the traits, Spirit will "just know" how to insert into ElemParseData. See it live on Coliru

2. The same using semantic actions

You can wire it up in semantic actions:

    start = *(  
vertex [phx::bind(insert, _val, _1)]
| elements [phx::bind(insert, _val, _1)]
);

With insert a member of type inserter:

struct inserter {
template <typename,typename> struct result { typedef void type; };

template <typename Attr, typename Vec>
void operator()(Attr& attr, Vec const& v) const { dispatch(attr, v); }
private:
static void dispatch(ElemParseData& data, std::vector<float> vertices) {
data.verts.insert(data.verts.end(), vertices.begin(), vertices.end());
}
static void dispatch(ElemParseData& data, std::vector<unsigned int> indices) {
data.idx.insert(data.idx.end(), indices.begin(), indices.end());
}
};

This looks largely the same, and it does the same: live on Coliru

3. Everything in semantic actions, at detail level

This is the only solution that doesn't require any kind of plumbing, except perhaps inclusion of boost/spirit/include/phoenix.hpp:

struct objGram : qi::grammar<std::string::const_iterator, ElemParseData(), iso8859::space_type>
{
objGram() : objGram::base_type(start)
{
using namespace qi;

auto add_vertex = phx::push_back(phx::bind(&ElemParseData::verts, _r1), _1);
auto add_index = phx::push_back(phx::bind(&ElemParseData::idx, _r1), _1);
vertex = 'v' >> double_ [add_vertex] >> double_ [add_vertex] >> double_ [add_vertex];
elements = 'f' >> int_ [add_index] >> int_ [add_index] >> int_ [add_index] ;

start = *(vertex(_val) | elements(_val));
}

qi::rule<std::string::const_iterator, ElemParseData(), iso8859::space_type> start;
qi::rule<std::string::const_iterator, void(ElemParseData&), iso8859::space_type> vertex, elements;
} objGrammar;

Note:

  • One slight advantage here would be that there is less copying of values
  • A disadvantage is that you lose 'atomicity' (if a line fails to parse after, say, the second value, the first two values will have been pushed into the ElemParseData members irrevocably).

Side note


There is a bug in the read loop, prefer the simpler options:

std::filebuf fb;
if (fb.open("parsetest.txt", std::ios::in))
{
ss << &fb;
fb.close();
}

Or consider boost::spirit::istream_iterator

auto concatenation of parse results into vectors

You missplaced the grouping parentheses: expanding

    vertexList = *(vertex | comment);
normalList = *(normal | comment);

by eliminating subrules leads to

    vertex     = *(('v'  >> qi::double_ >> qi::double_ >> qi::double_) | comment);
normal = *(("vn" >> qi::double_ >> qi::double_ >> qi::double_) | comment);

or, as I'd prefer:

Full working sample (please make your code samples SSCCE next time? https://meta.stackexchange.com/questions/22754/sscce-how-to-provide-examples-for-programming-questions):

#include <iterator>
#include <fstream>
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phx = boost::phoenix;

struct ObjParseData
{
ObjParseData() : verts(), norms() {}

std::vector<float> verts;
std::vector<float> norms;
};

BOOST_FUSION_ADAPT_STRUCT(ObjParseData, (std::vector<float>, verts)(std::vector<float>, norms))

template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, ObjParseData(), Skipper>
{
parser() : parser::base_type(start)
{
using namespace qi;

vertex = 'v' >> qi::double_ >> qi::double_ >> qi::double_;
normal = "vn" >> qi::double_ >> qi::double_ >> qi::double_;
comment = '#' >> qi::skip(qi::blank)[ *(qi::print) ];
#if 0
vertexList = *(vertex | comment);
normalList = *(normal | comment);
start = vertexList >> normalList;
#else
vertex = *(comment | ('v' >> qi::double_ >> qi::double_ >> qi::double_));
normal = *(comment | ("vn" >> qi::double_ >> qi::double_ >> qi::double_));
start = vertex >> normal;
#endif

BOOST_SPIRIT_DEBUG_NODE(start);
}

private:
qi::rule<std::string::const_iterator, ObjParseData(), qi::space_type> start;
qi::rule<std::string::const_iterator, std::vector<float>(), qi::space_type> vertexList;
qi::rule<std::string::const_iterator, std::vector<float>(), qi::space_type> normalList;
qi::rule<std::string::const_iterator, std::vector<float>(), qi::space_type> vertex;
qi::rule<std::string::const_iterator, std::vector<float>(), qi::space_type> normal;
qi::rule<std::string::const_iterator, qi::space_type> comment;
};

bool doParse(const std::string& input)
{
typedef std::string::const_iterator It;
auto f(begin(input)), l(end(input));

parser<It, qi::space_type> p;
ObjParseData data;

try
{
bool ok = qi::phrase_parse(f,l,p,qi::space,data);
if (ok)
{
std::cout << "parse success\n";
std::cout << "data: " << karma::format_delimited(
"v: " << karma::auto_ << karma::eol <<
"n: " << karma::auto_ << karma::eol, ' ', data);
}
else std::cerr << "parse failed: '" << std::string(f,l) << "'\n";

if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
return ok;
} catch(const qi::expectation_failure<It>& e)
{
std::string frag(e.first, e.last);
std::cerr << e.what() << "'" << frag << "'\n";
}

return false;
}

int main()
{
std::ifstream ifs("input.txt", std::ios::binary);
ifs.unsetf(std::ios::skipws);
std::istreambuf_iterator<char> f(ifs), l;

bool ok = doParse({ f, l });
}

Output:

parse success
data: v: -1.57 33.809 0.359 -24.012 0.005 21.744
n: 0.0 0.535 0.845 0.833 0.553 0.0

Right way to split an std::string into a vectorstring

For space separated strings, then you can do this:

std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

What
is
the
right
way
to
split
a
string
into
a
vector
of
strings


string that have both comma and space

struct tokens: std::ctype<char> 
{
tokens(): std::ctype<char>(get_table()) {}

static std::ctype_base::mask const* get_table()
{
typedef std::ctype<char> cctype;
static const cctype::mask *const_rc= cctype::classic_table();

static cctype::mask rc[cctype::table_size];
std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));

rc[','] = std::ctype_base::space;
rc[' '] = std::ctype_base::space;
return &rc[0];
}
};

std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

right
way
wrong
way
correct
way

How to push all the arguments into result vector when parsing with Spirit::Qi?

Enabling your debugging shows: https://godbolt.org/z/o3nvjz9bG

Not clear enough for me. Let's add an argument rule:

struct Command {
using Arg = std::string;
using Args = std::vector<Arg>;
enum TYPE { NONE, CMD1, CMD2, FAIL };

TYPE type = NONE;
Args args;
};

qi::rule<It, Command::Arg()> arg;

And

none = omit[*blank] >> &(eol | eoi)
>> attr(Command::NONE)
/*>> attr(Command::Args{})*/;

arg = raw[double_] | +~char_(",)\r\n");

cmd1 = lit("CMD1") >> attr(Command::CMD1) //
>> '(' >> arg >> ')';

cmd2 = lit("CMD2") >> attr(Command::CMD2) //
>> '(' >> arg >> ',' >> arg >> ')';

fail = omit[*~char_("\r\n")] //
>> attr(Command::FAIL);

Now we can see https://godbolt.org/z/3Kqr3K41v

  <cmd2>
<try>CMD2(identity, 25.5)</try>
<arg>
<try>identity, 25.5)</try>
<success>, 25.5)</success>
<attributes>[[i, d, e, n, t, i, t, y]]</attributes>
</arg>
<arg>
<try>25.5)</try>
<success>)</success>
<attributes>[[2, 5, ., 5]]</attributes>
</arg>
<success></success>
<attributes>[[CMD2, [[i, d, e, n, t, i, t, y]]]]</attributes>
</cmd2>

Clearly, both arguments are parsed, but only one is assigned. The sad fact is that you're actively confusing the rule, by adapting a two-element struct and parsing a sequence of 3 elements.

You can get this to work, but you'd have help it (e.g. with transform_attribute, attr_cast<> or a separate rule):

    arg  = raw[double_] | +~char_(",)\r\n");
args = arg % ',';

cmd1 = lit("CMD1") >> attr(Command::CMD1) //
>> '(' >> arg >> ')';

cmd2 = lit("CMD2") >> attr(Command::CMD2) //
>> '(' >> args >> ')';

Now you get:

  <cmd2>
<try>CMD2(identity, 25.5)</try>
<args>
<try>identity, 25.5)</try>
<arg>
<try>identity, 25.5)</try>
<success>, 25.5)</success>
<attributes>[[i, d, e, n, t, i, t, y]]</attributes>
</arg>
<arg>
<try> 25.5)</try>
<success>)</success>
<attributes>[[ , 2, 5, ., 5]]</attributes>
</arg>
<success>)</success>
<attributes>[[[i, d, e, n, t, i, t, y], [ , 2, 5, ., 5]]]</attributes>
</args>
<success></success>
<attributes>[[CMD2, [[i, d, e, n, t, i, t, y], [ , 2, 5, ., 5]]]]</attributes>
</cmd2>

Now this hints at an obvious improvement: improve the grammar by simplifying:

    none  = omit[*blank] >> &(eol | eoi) >> attr(Command{Command::NONE, {}});
fail = omit[*~char_("\r\n")] >> attr(Command::FAIL);

arg = raw[double_] | +~char_(",)\r\n");
args = '(' >> arg % ',' >> ')';
cmd = no_case[type_] >> -args;

start = skip(blank)[(cmd|fail) % eol] > eoi;

Then add validation to the commands after the fact.

Demo

Live On Compiler Explorer

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>

namespace qi = boost::spirit::qi;

struct Command {
using Arg = std::string;
using Args = std::vector<Arg>;
enum Type { NONE, CMD1, CMD2, FAIL };

Type type = NONE;
Args args;

friend std::ostream& operator<<(std::ostream& os, Type type) {
switch(type) {
case NONE: return os << "NONE";
case CMD1: return os << "CMD1";
case CMD2: return os << "CMD2";
case FAIL: return os << "FAIL";
default: return os << "???";
}
}
friend std::ostream& operator<<(std::ostream& os, Command const& cmd) {
os << cmd.type << "(";
auto sep = "";
for (auto& arg : cmd.args)
os << std::exchange(sep, ", ") << std::quoted(arg);
return os << ")";
}
};
using Commands = std::vector<Command>;

BOOST_FUSION_ADAPT_STRUCT(Command, type, args)

template <typename It> struct Parser : qi::grammar<It, Commands()> {
Parser() : Parser::base_type(start) {
using namespace qi;

none = omit[*blank] >> &(eol | eoi) >> attr(Command{Command::NONE, {}});
fail = omit[*~char_("\r\n")] >> attr(Command::FAIL);

arg = raw[double_] | +~char_(",)\r\n");
args = '(' >> arg % ',' >> ')';
cmd = no_case[type] >> -args;

start = skip(blank)[(cmd|none|fail) % eol] > eoi;

BOOST_SPIRIT_DEBUG_NODES((start)(fail)(none)(cmd)(arg)(args))
}

private:
struct type_sym : qi::symbols<char, Command::Type> {
type_sym() { this->add//
("cmd1", Command::CMD1)
("cmd2", Command::CMD2);
}
} type;
qi::rule<It, Command::Arg()> arg;
qi::rule<It, Command::Args()> args;
qi::rule<It, Command(), qi::blank_type> cmd, none, fail;
qi::rule<It, Commands()> start;
};

Commands parse(std::string const& text)
{
using It = std::string::const_iterator;
static const Parser<It> parser;

Commands commands;
It first = text.begin(), last = text.end();

if (!qi::parse(first, last, parser, commands))
throw std::runtime_error("command parse error");

return commands;
}

int main()
{
try {
for (auto& cmd : parse(R"(
CMD1(some ad hoc text)
this is a bogus line
cmd2(identity, 25.5))"))
std::cout << cmd << "\n";
} catch (std::exception const& e) {
std::cout << e.what() << "\n";
}
}

Prints

NONE()
CMD1("some ad hoc text")
FAIL()
CMD2("identity", " 25.5")

Split vector to multiple array/vector C++

Edit: I removed a verbose transposing function.


I assume that you want to convert std::vector<std::string> to a 2D matrix std::vector<std::vector<int>>.
For instance, for your example, the desired result is assumed to be arr1 = {0,1,...}, arr2 = {14,2,...} and arr3 = {150,220,...}.

First,

  • We can use std::istream_iterator to extract integers from strings.

  • We can also apply the range constructor to create a std::vector<int> corresponding to each string.

So the following function would work for you and it does not seem to be a spaghetti code at least to me.
First, this function extract two integer arrays {0,14,150,...} and {1,2,220,...} as matrices from a passed string vector v.
Since a default constructed std::istream_iterator is an end-of-stream iterator, each range constructor reads each string until it fails to read the next value.
And finally, transposed one is returned:

#include <vector>
#include <string>
#include <sstream>
#include <iterator>

template <typename T>
auto extractNumbers(const std::vector<std::string>& v)
{
std::vector<std::vector<T>> extracted;
extracted.reserve(v.size());

for(auto& s : v)
{
std::stringstream ss(s);
std::istream_iterator<T> begin(ss), end; //defaulted end-of-stream iterator.

extracted.emplace_back(begin, end);
}

// this also validates following access to extracted[0].
if(extracted.empty()){
return extracted;
}

decltype(extracted) transposed(extracted[0].size());
for(std::size_t i=0; i<transposed.size(); ++i){
for(std::size_t j=0; j<extracted.size(); ++j){
transposed.at(i).push_back(std::move(extracted.at(j).at(i)));
}
}

return transposed;
}

Then you can extract integers from a string vector as follows:

DEMO

std::vector<std::string> v(n);
v[0] = "0 14 150";
v[1] = "1 2 220";
...
v[n-1] = "...";

auto matrix = extractNumbers<int>(v);

where matrix[0] is arr1, matrix[1] is arr2, and so on.
We can also quickly get internal pointers of them by auto arr1 = std::move(matrix[0]);.

(R) Parse character vector and split into two separate columns

Use separate as shown below. Note that this requires tidyr 0.8.2 or later. Earlier versions did not support NA in the into argument.

library(dplyr)
library(tidyr)

table %>%
separate(var1, into = c("mean1", "sd1", NA), sep = "[ ()]+") %>%
separate(var2, into = c("mean2", "sd2", NA), sep = "[ ()]+")

giving:

# A tibble: 3 x 4
mean1 sd1 mean2 sd2
<chr> <chr> <chr> <chr>
1 27.0 3.1 171.4 9.0
2 27.0 3.2 176.8 7.2
3 27.1 3.0 165.0 6.2

Parse into a vectorvectordouble with boost::spirit

The answer is yes.
It is actually quite trivial to parse into vector<vector<double> >

The rule definition requires a function type, not the type directly. This is simply explained here. A more thorough explanation is probably found in the documentation of boost::phoenix

The output of the program above is now showing nicely the parsed values:

parse success.
0, 5011, 10000, 15000, 20000, 25000,
-40, 0, 20, 40,
Base:
200, 175, 170, 165, 160, 150,
200, 175, 170, 165, 160, 150,
165, 165, 160, 155, 145, 145,
160, 155, 150, 145, 145, 140,


Related Topics



Leave a reply



Submit