Boost Spirit X3 Cannot Compile Repeat Directive with Variable Factor

Boost Spirit X3 cannot compile repeat directive with variable factor

From what I gather, reading the source and the mailing list, Phoenix is not integrated into X3 at all: the reason being that c++14 makes most of it obsolete.

I agree that this leaves a few spots where Qi used to have elegant solutions, e.g. eps(DEFERRED_CONDITION), lazy(*RULE_PTR) (the Nabialek trick), and indeed, this case.

Spirit X3 is still in development, so we might see this added¹

For now, Spirit X3 has one generalized facility for stateful context. This essentially replaces locals<>, in some cases inherited arguments, and can be /made to/ validate the number of elements in this particular case as well:

  • x3::with²

Here's how you could use it:

with<_n>(std::ref(n)) 
[ omit[uint_[number] ] >>
*(eps [more] >> int_) >> eps [done] ]

Here, _n is a tag type that identifies the context element for retrieval with get<_n>(cxtx).

Note, currently we have to use a reference-wrapper to an lvalue n because with<_n>(0u) would result in constant element inside the context. I suppose this, too, is a QoI that may be lifted as X# matures

Now, for the semantic actions:

unsigned n;
struct _n{};

auto number = [](auto &ctx) { get<_n>(ctx).get() = _attr(ctx); };

This stores the parsed unsigned number into the context. (In fact, due to the ref(n) binding it's not actually part of the context for now, as mentioned)

auto more   = [](auto &ctx) { _pass(ctx) = get<_n>(ctx) >  _val(ctx).size(); };

Here we check that we're actually not "full" - i.e. more integers are allowed

auto done   = [](auto &ctx) { _pass(ctx) = get<_n>(ctx) == _val(ctx).size(); };

Here we check that we're "full" - i.e. no more integers are allowed.

Putting it all together:

Live On Coliru

#include <string>
#include <iostream>
#include <iomanip>

#include <boost/spirit/home/x3.hpp>

int main() {
for (std::string const input : {
"3 1 2 3", // correct
"4 1 2 3", // too few
"2 1 2 3", // too many
//
" 3 1 2 3 ",
})
{
std::cout << "\nParsing " << std::left << std::setw(20) << ("'" + input + "':");

std::vector<int> v;

bool ok;
{
using namespace boost::spirit::x3;

unsigned n;
struct _n{};

auto number = [](auto &ctx) { get<_n>(ctx).get() = _attr(ctx); };
auto more = [](auto &ctx) { _pass(ctx) = get<_n>(ctx) > _val(ctx).size(); };
auto done = [](auto &ctx) { _pass(ctx) = get<_n>(ctx) == _val(ctx).size(); };

auto r = rule<struct _r, std::vector<int> > {}
%= with<_n>(std::ref(n))
[ omit[uint_[number] ] >> *(eps [more] >> int_) >> eps [done] ];

ok = phrase_parse(input.begin(), input.end(), r >> eoi, space, v);
}

if (ok) {
std::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout << v.size() << " elements: ", " "));
} else {
std::cout << "Parse failed";
}
}
}

Which prints:

Parsing '3 1 2 3':          3 elements: 1 2 3 
Parsing '4 1 2 3': Parse failed
Parsing '2 1 2 3': Parse failed
Parsing ' 3 1 2 3 ': 3 elements: 1 2 3

¹ lend your support/voice at the [spirit-general] mailing list :)

² can't find a suitable documentation link, but it's used in some of the samples

Improvements in repeat directive with variable factor for X3

Usually it means that there was no PR for that feature (or it was but has some issues). The repeat also has design problems. For example you can parse {10 20 30} with it, but not {10, 20, 30} (requires a kind of list parser).

I cannot agree with that Qi has an elegant way of doing it because you have to use a rule with local variable or pass a reference to an external value. The natural way seems to be repeat(len_parser)[item_parser], but it has additional design issues with skippers (or skippers has design issues that limits complex directives flexibility).

Fortunately the Spirit X3 is much simpler in writing own parser combinators.

#include <boost/spirit/home/x3.hpp>

namespace x3e {

namespace x3 = boost::spirit::x3;

template <typename LenParser, typename Subject>
struct vlrepeat_directive : x3::unary_parser<Subject, vlrepeat_directive<LenParser, Subject>>
{
using base_type = x3::unary_parser<Subject, vlrepeat_directive<LenParser, Subject>>;
static bool const handles_container = true;

vlrepeat_directive(LenParser const& lp_, Subject const& subject)
: base_type(subject), lp(lp_) {}

template<typename Iterator, typename Context, typename RContext, typename Attribute>
bool parse(Iterator& first, Iterator const& last
, Context const& context, RContext& rcontext, Attribute& attr) const
{
static_assert(x3::traits::has_attribute<LenParser, Context>::value, "must syntesize an attribute");

Iterator iter = first;
typename x3::traits::attribute_of<LenParser, Context>::type len;
if (!lp.parse(iter, last, context, rcontext, len))
return false;

for (; len; --len) {
if (!x3::detail::parse_into_container(
this->subject, iter, last, context, rcontext, attr))
return false;
}

first = iter;
return true;
}

LenParser lp;
};

template <typename LenParser>
struct vlrepeat_gen
{
template <typename Subject>
vlrepeat_directive<LenParser, typename x3::extension::as_parser<Subject>::value_type>
operator[](Subject const& p) const
{
return { lp, x3::as_parser(p) };
}

LenParser lp;
};

template <typename Parser>
vlrepeat_gen<Parser> vlrepeat(Parser const& p)
{
static_assert(x3::traits::is_parser<Parser>::value, "have to be a parser");
return { p };
}

}

template <typename LenParser, typename Subject, typename Context>
struct boost::spirit::x3::traits::attribute_of<x3e::vlrepeat_directive<LenParser, Subject>, Context>
: build_container<typename attribute_of<Subject, Context>::type> {};

And use it:

#include <iostream>
#include <vector>

int main()
{
namespace x3 = boost::spirit::x3;

auto s = "5: 1 2 3 4 5", e = s + std::strlen(s);
std::vector<int> v;
if (phrase_parse(s, e, x3e::vlrepeat(x3::uint_ >> ':')[x3::int_], x3::space, v)) {
std::cout << "Result:\n";
for (auto x : v)
std::cout << x << '\n';
}
else
std::cout << "Failed!\n";
}

Output:

Result:
1
2
3
4
5

https://wandbox.org/permlink/K572K0BMEqA8lMJm

(it has a call to detail::parse_into_container which is not a public API)

Misunderstanding repeat directive - it should fail, but doesn't

If the literal is too long, the parser should fail

Where does it say that? It looks like the code does exactly what you ask: it parses at most 6 digits with the requisite underscores. The output even confirms that it does exactly that.

You can of course make it much more apparent by showing what was not parsed:

Live On Coliru

auto f = begin(s), l = end(s);
bool const ok = x3::parse(
f, l, x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]], attr);

fmt::print(
"{:21} -> {:5} {:13} remaining '{}'\n",
fmt::format("'{}'", s),
ok,
fmt::format("'{}'", attr),
std::string(f, l));

Prints

'0'                   -> true  '0'           remaining ''
'10' -> true '10' remaining ''
'1_0' -> true '1_0' remaining ''
'012345' -> true '012345' remaining ''
'0123456' -> true '012345' remaining '6'
'1_2_3_4_5_6_7_8_9_0' -> true '1_2_3_4_5_6' remaining '_7_8_9_0'
'1_2_3_4_5_6_' -> true '1_2_3_4_5_6' remaining '_'
'_0123_456' -> false '' remaining '_0123_456'
'' -> false '' remaining ''

Fixes

To assert that a complete input be parsed, use either x3::eoi or check the iterators:

Live On Coliru

bool const ok = x3::parse(
f,
l,
x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]] >> x3::eoi,
attr);

Prints

'0'                   -> true  '0'           remaining ''
'10' -> true '10' remaining ''
'1_0' -> true '1_0' remaining ''
'012345' -> true '012345' remaining ''
'0123456' -> false '012345' remaining '0123456'
'1_2_3_4_5_6_7_8_9_0' -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_7_8_9_0'
'1_2_3_4_5_6_' -> false '1_2_3_4_5_6' remaining '1_2_3_4_5_6_'
'_0123_456' -> false '' remaining '_0123_456'
'' -> false '' remaining ''

Distinct Lexemes

If instead you want to allow the input to continue, just not with certain characters, e.g. parsing many such "numbers":

auto const number = x3::lexeme[ //
x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
// within the lexeme, assert that no digit or _ follows
>> ! (cs | '_') //
];

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
namespace x3 = boost::spirit::x3;
auto const cs = x3::digit;
auto const number = x3::lexeme[ //
x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
// within the lexeme, assert that no digit or _ follows
>> ! (cs | '_') //
];
auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
auto const numbers = x3::skip(ws_or_comment)[number % ','];
} // namespace Parser

int main()
{
std::vector<std::string> attr;
std::string_view const s =
R"(0,
10,
1_0,
012345,
// too long
0123456,
1_2_3_4_5_6_7_8_9_0,
// absolutely invalid
1_2_3_4_5_6_,
_0123_456)"sv;

auto f = begin(s), l = end(s);
bool const ok = parse(f, l, Parser::numbers, attr);

fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints

true: ["0", "10", "1_0", "012345"]
remaining ',
// too long
0123456,
1_2_3_4_5_6_7_8_9_0,
// absolutely invalid
1_2_3_4_5_6_,
_0123_456'

Proving It

To drive home the point of checking inside the lexeme in the presence of otherwise insignificant whitespace:

auto const numbers = x3::skip(ws_or_comment)[*number];

With a slightly adjusted test input (removing the commas):

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
using namespace std::string_view_literals;

namespace Parser {
namespace x3 = boost::spirit::x3;
auto const cs = x3::digit;
auto const number = x3::lexeme[ //
x3::raw[cs >> x3::repeat(0, 5)[('_' >> cs) | cs]]
// within the lexeme, assert that no digit or _ follows
>> ! (cs | '_') //
];
auto const ws_or_comment = x3::space | "//" >> *~x3::char_("\r\n");
auto const numbers = x3::skip(ws_or_comment)[*number];
} // namespace Parser

int main()
{
std::vector<std::string> attr;
std::string_view const s =
R"(0
10
1_0
012345
// too long
0123456
1_2_3_4_5_6_7_8_9_0
// absolutely invalid
1_2_3_4_5_6_
_0123_456)"sv;

auto f = begin(s), l = end(s);
bool const ok = parse(f, l, Parser::numbers, attr);

fmt::print("{}: {}\nremaining '{}'\n", ok, attr, std::string(f, l));
}

Prints

true: ["0", "10", "1_0", "012345"]
remaining '0123456
1_2_3_4_5_6_7_8_9_0
// absolutely invalid
1_2_3_4_5_6_
_0123_456'

How future-safe is it to write a parser with Boost Spirit X3?

It's already released, so there's little chance of it just vanishing.

I liberally use X3 even in production code: After all, we do have tests for a reason.

That said, I know a number of hairy issues surround the linking of rules spread across separate translation units¹.

Here's a list of things that make me consider not using X3 in the following cases:

  • where Qi's attribute transformation logic is more enticing (makes for more readable rules). See e.g.
  • Phoenix integration is desired Boost Spirit X3 cannot compile repeat directive with variable factor
  • Sharing rules across TUs is desired

Slightly less pressing differences are when:

  • locals are involved ("X3 becomes a real tedium, (if not completely unbearable) with stateful rules (by which I mean rules with "locals")"). A lot of it can be solved using with<>: Boost Spirit X3 cannot compile repeat directive with variable factor but I'm not convinced it's re-entrant
  • lazy rule invocation is required²
  • Lexer is desired (i.e. I wouldn't port a Qi/Lex grammar to X3, except by rewrite)

Note however, there are definite areas where X3 shines:

  • compilation time
  • ease of generating dynamic rules/custom directives (see boost::spirit::x3 attribute compatibility rules, intuition or code? or Recursive x3 parser with results passing around)
  • ease of creating custom parsers (e.g. Spirit-Qi: How can I write a nonterminal parser?)

¹ see the mailing list, and e.g. x3 linker error with separate TU and linking errors while separate parser using boost spirit x3

² In fact, it might be "easy" to create one by creating a custom parser, building on with<> and any_parser<>

Boost Spirit X3 local variables and getting the synthesized attribute

I had the same findings!

The trick with "locals" is to use the with<> directive.

Because you give no usage scenario, I don't think it's worth coming up with examples, though you can search my answers for them*

  • Boost Spirit X3 cannot compile repeat directive with variable factor
  • Boost Spirit X3 AST not working with semantic actions when using separate rule definition and instantiation
  • Using boost spirit for a stack based language

The trick with the second is to just use a semantic action (which can be a lambda) and assign _pass: Boost Spirit X3 cannot compile repeat directive with variable factor shows this too:

auto zerosum = [](auto &ctx) { 
auto& v = x3::_attr(ctx);
_pass(ctx) = std::accumulate(v.begin(), v.end(), 0) == 0;
};

Boost Spirit X3 AST not working with semantic actions when using separate rule definition and instantiation

I must admit actually reconstructing your sample was a bit too much work for me (call me lazy...).

However, I know the answer and a trick to make your life simpler.

The Answer

Semantic actions on a rule definition inhibit automatic attribute propagation. From the Qi docs (the same goes for X3, but I always lose the link to the docs):

r = p; Rule definition

This is equivalent to r %= p (see below) if there are no semantic actions attached anywhere in p.

r %= p; Auto-rule definition

The attribute of p should be compatible with the synthesized attribute of r. When p is successful, its attribute is automatically propagated to r's synthesized attribute.

The Trick

You can inject state (your n reference, in this case) using the x3::with<> directive. That way you don't have the namespace global (n) and can make the parser reentrant, threadsafe etc.

Here's my "simplist" take on things, in a single file:

namespace parsing {
x3::rule<struct parser, ast::ast_struct> parser {"parser"};

struct state_tag { };

auto record_number = [](auto &ctx) {
unsigned& n = x3::get<state_tag>(ctx);
n = x3::_attr(ctx);
};

auto parser_def = x3::rule<struct parser_def, ast::ast_struct> {}
%= x3::int_[record_number] >> +(x3::omit[+x3::blank] >> x3::int_);

BOOST_SPIRIT_DEFINE(parser)
}

Tip: run the demo with = instead of the %= to see the difference in behaviour!

Note that get<state_tag>(ctx) returns a reference_wrapper<unsigned> just because we use the parser as follows:

void parse(const std::string &data) {
using namespace std;

ast::ast_struct ast;
unsigned n;
auto parser = x3::with<parsing::state_tag>(ref(n)) [parsing::parser] >> x3::eoi;

if (x3::parse(data.begin(), data.end(), parser, ast)) {
cout << "n: " << n << ", ";
copy(ast.numbers.begin(), ast.numbers.end(), ostream_iterator<int>(cout << ast.numbers.size() << " elements: ", " "));
cout << "\n";
} else
cout << "Parse failed\n";
}

Live Demo

Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>

namespace ast {
struct ast_struct {
int number;
std::vector<int> numbers;
};
}

BOOST_FUSION_ADAPT_STRUCT(ast::ast_struct, number, numbers)

namespace x3 = boost::spirit::x3;

namespace parsing {
x3::rule<struct parser, ast::ast_struct> parser {"parser"};

struct state_tag { };

auto record_number = [](auto &ctx) {
unsigned& n = x3::get<state_tag>(ctx); // note: returns reference_wrapper<T>
n = x3::_attr(ctx);
};

auto parser_def = x3::rule<struct parser_def, ast::ast_struct> {}
%= x3::int_[record_number] >> +(x3::omit[+x3::blank] >> x3::int_);

BOOST_SPIRIT_DEFINE(parser)
}

void parse(const std::string &data) {
using namespace std;

ast::ast_struct ast;
unsigned n = 0;
auto parser = x3::with<parsing::state_tag>(ref(n)) [parsing::parser] >> x3::eoi;

if (x3::parse(data.begin(), data.end(), parser, ast)) {
cout << "n: " << n << ", ";
copy(ast.numbers.begin(), ast.numbers.end(), ostream_iterator<int>(cout << ast.numbers.size() << " elements: ", " "));
cout << "\n";
} else
cout << "Parse failed\n";
}

int main() {
parse("3 1 2 3");
parse("4 1 2 3 4");
}

Prints

n: 3, 3 elements: 1 2 3 
n: 4, 4 elements: 1 2 3 4

Boost spirit x3 - lazy parser

I thought I would try my hand here.

What is needed is some type-erasure around the iterator and attribute types. This is getting very close to the interface of a qi::rule in the old days.

To be complete we could actually also erase or transform contexts (e.g. to propagate the skipper inside the lazy rule), but I chose for simplicity here.

In many cases the parsers to be lazily invoked might be lexemes anyways (as in the sample I will use)

In our use-case, let's parse these inputs:

integer_value: 42
quoted_string: "hello world"
bool_value: true
double_value: 3.1415926

We'll use a variant attribute type, and start with creating a lazy_rule parser that will allow us to erase the types:

using Value = boost::variant<int, bool, double, std::string>;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, Value>;

Passing The Lazy Subject Around

Now, where do we "get" the lazy subject from?

In Spirit Qi, we had the Nabialek Trick. This would use qi::locals<> or inherited attributes, which basically both boiled down to using Phoenix lazy actors (qi::_r1 or qi::_a etc) to evaluate to a value from parser context at runtime.

In X3 there is no Phoenix, and we will have to manipulate the context using semantic actions ourselves.

The basic building block for this is the x3::with<T>[] directive¹. Here's what we'll end up using as the parser:

x3::symbols<Rule> options;

Now we can add any parse expression to the options, by saying e.g. options.add("anything", x3::eps);.

auto const parser = x3::with<Rule>(Rule{}) [
set_context<Rule>[options] >> ':' >> lazy<Rule>
];

This adds a Rule value to the context, which can be set (set_context) and "executed" (lazy).

Like I said, we have to manipulate the context manually, so let's define some helpers that do this:

template <typename Tag>
struct set_context_type {
template <typename P>
auto operator[](P p) const {
auto action = [](auto& ctx) {
x3::get<Tag>(ctx) = x3::_attr(ctx);
};
return x3::omit [ p [ action ] ];
}
};

template <typename Tag>
struct lazy_type : x3::parser<lazy_type<Tag>> {
using attribute_type = typename Tag::attribute_type; // TODO FIXME?

template<typename It, typename Ctx, typename RCtx, typename Attr>
bool parse(It& first, It last, Ctx& ctx, RCtx& rctx, Attr& attr) const {
auto& subject = x3::get<Tag>(ctx);

It saved = first;
x3::skip_over(first, last, ctx);
if (x3::as_parser(subject).parse(first, last,
std::forward<Ctx>(ctx),
std::forward<RCtx>(rctx), attr)) {
return true;
} else {
first = saved;
return false;
}
}
};

template <typename T> static const set_context_type<T> set_context{};
template <typename T> static const lazy_type<T> lazy{};

That's really all there is to it.

Demo Time

In this demo, we run the above inputs (in function run_tests()) and it will use the parser as shown:

auto run_tests = [=] {
for (std::string const& input : {
"integer_value: 42",
"quoted_string: \"hello world\"",
"bool_value: true",
"double_value: 3.1415926",
})
{
Value attr;
std::cout << std::setw(36) << std::quoted(input);
if (phrase_parse(begin(input), end(input), parser, x3::space, attr)) {
std::cout << " -> success (" << attr << ")\n";
} else {
std::cout << " -> failed\n";
}
}
};

First we will run:

options.add("integer_value", x3::int_);
options.add("quoted_string", as<std::string> [
// lexeme is actually redundant because we don't use surrounding skipper yet
x3::lexeme [ '"' >> *('\\' >> x3::char_ | ~x3::char_('"')) >> '"' ]
]);
run_tests();

Which will print:

"integer_value: 42"                  -> success (42)
"quoted_string: \"hello world\"" -> success (hello world)
"bool_value: true" -> failed
"double_value: 3.1415926" -> failed

Now, we can demonstrate the dynamic nature of that parser, by extending options:

options.add("double_value", x3::double_);
options.add("bool_value", x3::bool_);

run_tests();

And the output becomes:

"integer_value: 42"                  -> success (42)
"quoted_string: \"hello world\"" -> success (hello world)
"bool_value: true" -> success (true)
"double_value: 3.1415926" -> success (3.14159)

Note, I threw in another helper as<> that makes it easier to coerce the attribute type to std::string there. It's an evolution of ideas in earlier answers

Full Listing Live On Coliru

See it Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

namespace {
template <typename T>
struct as_type {
template <typename...> struct Tag{};

template <typename P>
auto operator[](P p) const {
return x3::rule<Tag<T, P>, T> {"as"} = x3::as_parser(p);
}
};

template <typename Tag>
struct set_lazy_type {
template <typename P>
auto operator[](P p) const {
auto action = [](auto& ctx) {
x3::get<Tag>(ctx) = x3::_attr(ctx);
};
return x3::omit [ p [ action ] ];
}
};

template <typename Tag>
struct do_lazy_type : x3::parser<do_lazy_type<Tag>> {
using attribute_type = typename Tag::attribute_type; // TODO FIXME?

template <typename It, typename Ctx, typename RCtx, typename Attr>
bool parse(It& first, It last, Ctx& ctx, RCtx& rctx, Attr& attr) const {
auto& subject = x3::get<Tag>(ctx);

It saved = first;
x3::skip_over(first, last, ctx);
if (x3::as_parser(subject).parse(first, last,
std::forward<Ctx>(ctx),
std::forward<RCtx>(rctx), attr)) {
return true;
} else {
first = saved;
return false;
}
}
};

template <typename T> static const as_type<T> as{};
template <typename T> static const set_lazy_type<T> set_lazy{};
template <typename T> static const do_lazy_type<T> do_lazy{};
}

int main() {
std::cout << std::boolalpha << std::left;

using Value = boost::variant<int, bool, double, std::string>;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, Value>;

x3::symbols<Rule> options;

auto const parser = x3::with<Rule>(Rule{}) [
set_lazy<Rule>[options] >> ':' >> do_lazy<Rule>
];

auto run_tests = [=] {
for (std::string const input : {
"integer_value: 42",
"quoted_string: \"hello world\"",
"bool_value: true",
"double_value: 3.1415926",
})
{
Value attr;
std::cout << std::setw(36) << std::quoted(input);
if (phrase_parse(begin(input), end(input), parser, x3::space, attr)) {
std::cout << " -> success (" << attr << ")\n";
} else {
std::cout << " -> failed\n";
}
}
};

std::cout << "Supporting only integer_value and quoted_string:\n";
options.add("integer_value", x3::int_);
options.add("quoted_string", as<std::string> [
// lexeme is actually redundant because we don't use surrounding skipper yet
x3::lexeme [ '"' >> *('\\' >> x3::char_ | ~x3::char_('"')) >> '"' ]
]);
run_tests();

std::cout << "\nAdded support for double_value and bool_value:\n";
options.add("double_value", x3::double_);
options.add("bool_value", x3::bool_);

run_tests();
}


Related Topics



Leave a reply



Submit