Understanding the List Operator (%) in Boost.Spirit

Understanding the List Operator (%) in Boost.Spirit

Update X3 version added

First off, you fallen into a deep trap here:

Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).

So, first off:

const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi); 

Now, when you delete the redundancy in the program, you get:

Reproducing the problem

Live On Coliru

int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}

Printing

1: 2, 3, 4, 
1: 2,

The cause and the fix

What happened to 3, 4 which was successfully parsed?

Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.

If you want to make container attributes parse as "an atomic group", use qi::as<>:

test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));

Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:

Live On Coliru

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
int id;
using values_t = std::vector<int>;
values_t values;
};

BOOST_FUSION_ADAPT_STRUCT(Record, id, values)

namespace qi = boost::spirit::qi;

template <typename T>
void test(T const& rule) {
const std::string str = "1: 2, 3, 4";

Record record;

if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
std::cout << record.id << ": ";
for (const auto& value : record.values) { std::cout << value << ", "; }
std::cout << '\n';
} else {
std::cerr << "syntax error\n";
}
}

int main() {
test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}

Prints

1: 2, 3, 4, 
1: 2,
1: 2, 3, 4,

Attributes of sequence and list operator in boost.spirit qi?

And because a: A, b: vector --> (a >> b): vector, then I think that (qi::char_(L"{") >> *(char_-char_(L"}")) >> char_(L"}")) should be vector. This is contracdicted to the result.

Indeed that's not what happens. Applying a modernized trick from Detecting the parameter types in a Spirit semantic action

struct sense_f {
template <typename T> void operator()(T&&) const {
std::cout << boost::core::demangle(typeid(T).name()) << "\n";
}
};
static const boost::phoenix::function<sense_f> sense;

We can print the actual attribute type:

ru = (char_(L'{') >> *(char_ - char_(L'}')) >> char_(L'}')) [sense(qi::_0)] % qi::eol;

Which will print Live On Coliru:

boost::fusion::vector<wchar_t, std::vector<wchar_t, std::allocator<wchar_t> >, wchar_t>

Simple Solution

Assuming that you don't need to capture the {}, you can just make them literals instead of char_:

ru = (L'{' >> *(char_ - L'}') >> L'}') [sense(qi::_0)] % qi::eol;

Which will print Live On Coliru:

boost::fusion::vector<std::vector<wchar_t, std::allocator<wchar_t> >&>

Indeed, if you also make it propagate the attribute:

ru %= (L'{' >> *(char_ - L'}') >> L'}') [sense(qi::_0)] % qi::eol;

The program prints:

boost::fusion::vector<std::vector<wchar_t, std::allocator<wchar_t> >&>
boost::fusion::vector<std::vector<wchar_t, std::allocator<wchar_t> >&>
"\"id\":23,\"text\":\"sf
sf\""
"\"id\":23,\"text\":\"sfsf\""

Note that there is attribute compatibility between std::vector<wchar_t> and std::wstring which is why I used the latter.

Bonus

If you DO want to include {} and any intermediate whitespace, use qi::raw:

ru %= qi::raw [L'{' >> *(char_ - L'}') >> L'}'] [sense(qi::_0)] % qi::eol;

Now it prints:

boost::fusion::vector<boost::iterator_range<__gnu_cxx::__normal_iterator<wchar_t const*, std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > > >&>
boost::fusion::vector<boost::iterator_range<__gnu_cxx::__normal_iterator<wchar_t const*, std::__cxx11::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> > > >&>
"{\"id\":23,\"text\":\"sf
sf\"}"
"{\"id\":23,\"text\":\"sfsf\"}"

As you can see even iterator_range<It> has attribute compatibility with std::wstring because the input is also a sequence of wchar_t.

Of course, take the sense action off unless you want that output.

Full Listing

The final result using the qi::raw approach:

Live On Coliru

#define BOOST_SPIRIT_UNICODE

#include <boost/spirit/include/qi.hpp>
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>

namespace sw = boost::spirit::standard_wide;
namespace qi = boost::spirit::qi;
using sw::char_;

int main() {
std::wstring s = LR"({"id":23,"text":"sf
sf"}
{"id":23,"text":"sfsf"})";

using Data = std::vector<std::wstring>;
using It = std::wstring::const_iterator;

qi::rule<It, Data(), sw::blank_type> ru
= qi::raw [L'{' >> *(char_ - L'}') >> L'}'] % qi::eol;

Data result;
It f = s.begin(), l = s.end();

if (qi::phrase_parse(f, l, ru, sw::blank, result)) {
for (auto& s : result) {
std::wcout << std::quoted(s) << std::endl;
};
} else {
std::wcout << "Parse failed\n";
}

if (f!=l) {
std::wcout << L"Remaining unparsed: " << std::quoted(std::wstring(f,l)) << std::endl;
}
}

How to use boost spirit list operator with mandatory minimum amount of elements?

Making the list operator accept a minimum number of elements would require creating a brand new parser introducing that behaviour because, unlike repeat, it is not configured to do so. I hope the following example can help you understand how you can use a >> +(omit[b] >> a) to achieve what you want.

Running on WandBox

#include <iostream>
#include <vector>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>

namespace qi= boost::spirit::qi;

void print(const std::vector<std::string>& data)
{
std::cout << "{ ";
for(const auto& elem : data) {
std::cout << elem << " ";
}
std::cout << "} ";
}

void print(const std::pair<std::string,double>& data)
{
std::cout << "[ " << data.first << ", " << data.second << " ]";
}

template <typename Parser,typename... Attrs>
void parse(const std::string& str, const Parser& parser, Attrs&... attrs)
{
std::string::const_iterator iter=std::begin(str), end=std::end(str);
bool result = qi::phrase_parse(iter,end,parser,qi::space,attrs...);
if(result && iter==end) {
std::cout << "Success.";
int ignore[] = {(print(attrs),0)...};
std::cout << "\n";
} else {
std::cout << "Something failed. Unparsed: \"" << std::string(iter,end) << "\"\n";
}
}

template <typename Parser>
void parse_with_nodes(const std::string& str, const Parser& parser)
{
std::vector<std::string> nodes;
parse(str,parser,nodes);
}

template <typename Parser>
void parse_with_nodes_and_attr(const std::string& str, const Parser& parser)
{
std::vector<std::string> nodes;
std::pair<std::string,double> attr_pair;
parse(str,parser,nodes,attr_pair);
}

int main()
{
qi::rule<std::string::const_iterator,std::string()> node=+qi::alnum;
qi::rule<std::string::const_iterator,std::pair<std::string,double>(),qi::space_type> attr = +qi::alpha >> '=' >> qi::double_;

parse_with_nodes("node1->node2", node % "->");

parse_with_nodes_and_attr("node1->node2 arrowsize=1.0", node % "->" >> attr);

parse_with_nodes("node1->node2", node >> +("->" >> node));

//parse_with_nodes_and_attr("node1->node2 arrowsize=1.0", node >> +("->" >> node) >> attr);

qi::rule<std::string::const_iterator,std::vector<std::string>(),qi::space_type> at_least_two_nodes = node >> +("->" >> node);
parse_with_nodes_and_attr("node1->node2 arrowsize=1.0", at_least_two_nodes >> attr);
}

Parse only specific numbers with Boost.Spirit

One way is to attach to the qi::uint_ parser a semantic action that checks the parser's attribute and sets the semantic action's third parameter accordingly:

#include <iostream>
#include <string>
#include <vector>

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
qi::rule<std::string::const_iterator, unsigned(), qi::ascii::space_type> rule;

const auto not_greater_than_12345 = [](const unsigned& attr, auto&, bool& pass) {
pass = !(attr > 12345U);
};
rule %= qi::uint_[not_greater_than_12345];

std::vector<std::string> numbers{"0", "123", "1234", "12345", "12346", "123456"};
for (const auto& number : numbers) {
unsigned result;
auto iter = number.cbegin();
if (qi::phrase_parse(iter, number.cend(), rule, qi::ascii::space, result) &&
iter == number.cend()) {
std::cout << result << '\n'; // 0 123 1234 12345
}
}
}

Live on Wandbox

The semantic action can be written more concisely with the Phoenix placeholders _pass and _1:

#include <iostream>
#include <string>
#include <vector>

#include <boost/phoenix/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
qi::rule<std::string::const_iterator, unsigned(), qi::ascii::space_type> rule;

rule %= qi::uint_[qi::_pass = !(qi::_1 > 12345U)];

std::vector<std::string> numbers{"0", "123", "1234", "12345", "12346", "123456"};
for (const auto& number : numbers) {
unsigned result;
auto iter = number.cbegin();
if (qi::phrase_parse(iter, number.cend(), rule, qi::ascii::space, result) &&
iter == number.cend()) {
std::cout << result << '\n'; // 0 123 1234 12345
}
}
}

Live on Wandbox


From Semantic Actions with Parsers

The possible signatures for functions to be used as semantic actions are:

...
template <typename Attrib, typename Context>
void fa(Attrib& attr, Context& context, bool& pass);

... Here Attrib is the attribute type of the parser attached to the semantic action. ... The third parameter, pass, can be used by the semantic action to force the associated parser to fail. If pass is set to false the action parser will immediately return false as well, while not invoking p and not generating any output.

Statefulness of Spirit V2 and X3

You make a number of unsubstantiated claims in your "question" article.

I recognize much of the sentiment that shines through your rant, but I find it hard to constructively respond when there is so much debatable in it.

New Possibilities

X3 is expecting an user to define his every single rule in namespace scope, with auto-consted instance.

This is simply not true. X3 doesn't do that. It could be said that X3 promotes that pattern to enable key features like

  • recursive grammars
  • separation of parsers across translation units

On the flip side, there's not always a need for any of that.

The very value-orientedness of X3 enables new patterns to achieve things. I'm quite fond of being able to do things like:

Stateful Parser Factories

auto make_parser(char delim) {
return lexeme [ delim >> *('\\' >> char_ | ~char_(delim)) >> delim ];
}

Indeed, you might "need" x3::rule to achieve attribute coercion (like qi::transfom_attr):

auto make_parser(char delim) {
return rule<struct _, std::string> {} = lexeme [ delim >> *('\\' >> char_ | ~char_(delim)) >> delim ];
}

In fact, I've used this pattern to make quick-and-dirty as<T>[] directive: Understanding the List Operator (%) in Boost.Spirit.

auto make_parser(char delim) {
return as<std::string> [ lexeme [ delim >> *('\\' >> char_ | ~char_(delim)) >> delim ] ];
}

Nothing prevents you from using a dynamic parser factory like that to use context from surrounding state.

Stateful Semantic Actions

Semantic actions are copied by value, but they can freely refer to external state. When using factory functions, they can, again, use surrounding state.

Stateful directives

The only way directives to create state on the fly is to extend the actual context object. The x3::with<> directive supports this, e.g. Boost Spirit X3 cannot compile repeat directive with variable factor

This can be used to pigeon-hole unlimited amounts of state, e.g. by just side-channel passing a (smart) pointer/reference to your parser state.

Custom Parsers

Custom parsers are a surprisingly simple way to get a lot of power in X3. See for an example:

Spirit-Qi: How can I write a nonterminal parser?

I personally think custom parsers are more elegant than anything like the BOOST_SPIRIT_DECLARE/_DEFINE/_INSTANTIATE dance. I admit I've never created anything requiring multi-TU parsers in pure X3 yet (I tend to use X3 for small, independent parsers), but I intuitively prefer building my own TU-separation logic building from x3::parser_base over the "blessed" macros mentioned above. See also this discussion: Design/structure X3 parser more like Qi parser

Error/success handling

The compiler tutorials show how to trigger handlers for specific rules using a marker base-class for the rule tag type. I've one day figured out the mechanics, but sadly I don't remember all the details and LiveCoding.tv seems to have lost my live-stream on the topic.

I encourage you to look at the compiler samples (they're in the source tree only).

Summarizing

I can see how you notice negative differences. It's important to realize that X3 is less mature, aims to be more light-weight, so some things are simply not implemented. Also note that X3 enables many things in more elegant ways than previously possible. The fact that most things interact more naturally with c++14 core language features is a big boon.

If you want read more about what things disappoint me about X3, see the introductory discussion in that linked answer, some discussions in chat (like this one).

I hope my counter rant helps you in journey learning X3. I tried to substantiate as many things as I could, though I freely admit I sometimes still prefer Qi.

boost::spirit::x3 attribute compatibility rules, intuition or code?

If you're not on the develop branch you don't have the fix for that single-element sequence adaptiation bug, so yeah it's probably that.

Due to the genericity of attribute transformation/propagation, there's a lot of wiggle room, but of course it's just documented and ultimately in the code. In other words: there's no magic.

In the Qi days I'd have "fixed" this by just spelling out the desired transform with qi::as<> or qi::attr_cast<>. X3 doesn't have it (yet), but you can use a rule to mimick it very easily:

Live On Coliru

#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

struct Name {
std::string value;
};

BOOST_FUSION_ADAPT_STRUCT(Name, value)

int main() {

std::string const input = "Halleo123_1";
Name out;

bool ok = x3::parse(input.begin(), input.end(),
x3::rule<struct _, std::string>{} =
x3::alpha >> *(x3::alnum | x3::char_('_')),
out);

if (ok)
std::cout << "Parsed: " << out.value << "\n";
else
std::cout << "Parse failed\n";
}

Prints:

Parsed: Halleo123_1

Automate it

Because X3 works so nicely with c++14 core language features, it's not hard to reduce typing:

Understanding the List Operator (%) in Boost.Spirit

spirit x3 cannot propagate attributes of type optionalvector

I don't think the normative claim "should be able [...] as Qi does" cuts wood. X3 is not an evolution of Qi, for very good reasons (such as this).

An oft recurring pattern is that type hints are required in more complicated propagation scenarios. The ugly verbose way could be like this:

    -(x3::rule<struct _, std::string> {} = +x3::alpha),

Live On Coliru

Or you can use the hack I described previously:

namespace {
template <typename T>
struct as_type {
template <typename Expr>
auto operator[](Expr&& expr) const {
return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
}
};

template <typename T> static const as_type<T> as = {};
}

Live On Coliru

Spirit.X3 using string_view and member named 'insert' compiler error

The problem here seems to be with the is_container trait:

template <typename T>
using is_container = mpl::bool_<
detail::has_type_value_type<T>::value &&
detail::has_type_iterator<T>::value &&
detail::has_type_size_type<T>::value &&
detail::has_type_reference<T>::value>;

In Qi, that would have been specializable:

template <> struct is_container<std::string_view> : std::false_type {};

However in X3 it started being a template alias, which cannot be specialized.

This is a tough issue, as it seems that there is simply no customization point to get X3 to do what we need here.

Workaround

I've tried to dig deeper. I have not seen a "clean" way around this. In fact, the attribute coercion trick can help, though, if you use it to "short out" the heuristic that causes the match:

  • the attribute is "like a" container of "char"
  • the parser could match such a container

In this situation we can coerce the parser's attribute to specifically be non-compatible, and things will start working.

Correctly Overriding move_to

This, too, is an area of contention. Simply adding the overload like:

template <typename It>
inline void move_to(It b, It e, std::string_view& v) {
v = std::string_view(&*b, std::distance(b,e));
}

is not enough to make it the best overload.

The base template is

template <typename Iterator, typename Dest>
inline void move_to(Iterator first, Iterator last, Dest& dest);

To actually make it stick, we need to specialize. However, specializing and function templates is not a good match. In particular, we can't partially specialize, so we'll end up hard-coding the template arguments:

template <>
inline void move_to<Iterator, std::string_view>(Iterator b, Iterator e, std::string_view& v) {
v = std::string_view(&*b, std::distance(b,e));
}

This is making me question whether move_to is "user-serviceable" at all, much like is_container<> above, it just seems not designed for extension.

I do realize I've applied it in the past myself, but I also learn as I go.

Coercing: Hacking the System

Instead of declaring the rule's attribute std::string_view (leaving X3's type magic room to "do the right thing"), let's etch in stone the intended outcome of raw[] (and leave X3 to do the rest of the magic using move_to):

namespace parser {
namespace x3 = boost::spirit::x3;
const auto str
= x3::rule<struct _, boost::iterator_range<Iterator> >{"str"}
= x3::raw[ +~x3::char_('_')] >> '_';
const auto str_vec = *str;
}

This works. See it Live On Wandbox

Prints

hello
world

Alternative

That seems brittle. E.g. it'll break if you change Iterator to char const* (or, use std::string const input = "hello_world_", but not both).

Here's a better take (I think):

namespace boost { namespace spirit { namespace x3 {

template <typename Char, typename CharT, typename Iterator>
struct default_transform_attribute<std::basic_string_view<Char, CharT>, boost::iterator_range<Iterator>> {
using type = boost::iterator_range<Iterator>;

template <typename T> static type pre(T&&) { return {}; }

static void post(std::basic_string_view<Char, CharT>& sv, boost::iterator_range<Iterator> const& r) {
sv = std::basic_string_view<Char, CharT>(std::addressof(*r.begin()), r.size());
}
};

} } }

Now, the only hoop left to jump is that the rule declaration mentions the iterator type. You can hide this too:

namespace parser {
namespace x3 = boost::spirit::x3;

template <typename It> const auto str_vec = [] {
const auto str
= x3::rule<struct _, boost::iterator_range<It> >{"str"}
= x3::raw[ +~x3::char_('_')] >> '_';
return *str;
}();
}

auto parse(std::string_view input) {
auto b = input.begin(), e = input.end();
std::vector<std::string_view> data;
parse(b, e, parser::str_vec<decltype(b)>, data);
return data;
}

int main() {
for(auto& x : parse("hello_world_"))
std::cout << x << "\n";
}

This at once demonstrates that it works with non-pointer iterators.

Note: for completeness you'd want to statically assert the iterator models the ContiguousIterator concept (c++17)

Final Version Live

Live On Wandbox

#include <iostream>
#include <string>
#include <string_view>
#include <boost/spirit/home/x3.hpp>

namespace boost { namespace spirit { namespace x3 {

template <typename Char, typename CharT, typename Iterator>
struct default_transform_attribute<std::basic_string_view<Char, CharT>, boost::iterator_range<Iterator>> {
using type = boost::iterator_range<Iterator>;

template <typename T> static type pre(T&&) { return {}; }

static void post(std::basic_string_view<Char, CharT>& sv, boost::iterator_range<Iterator> const& r) {
sv = std::basic_string_view<Char, CharT>(std::addressof(*r.begin()), r.size());
}
};

} } }

namespace parser {
namespace x3 = boost::spirit::x3;

template <typename It> const auto str_vec = [] {
const auto str
= x3::rule<struct _, boost::iterator_range<It> >{"str"}
= x3::raw[ +~x3::char_('_')] >> '_';
return *str;
}();
}

auto parse(std::string_view input) {
auto b = input.begin(), e = input.end();
std::vector<std::string_view> data;
parse(b, e, parser::str_vec<decltype(b)>, data);
return data;
}

int main() {
for(auto& x : parse("hello_world_"))
std::cout << x << "\n";
}


Related Topics



Leave a reply



Submit