Boost Spirit: "Semantic Actions Are Evil"

Boost Spirit: Semantic actions are evil?

I'm sure Hartmut will answer in a second. Till then, this is my take:

No that is not an official point.

Semantic actions have some drawbacks

  • The simplest disadvantage of semantic actions is the stylistic notion of separation of concerns. You want to express syntax in one place, and semantics in another. This helps maintainability (especially with regards to the lengthy compile times for compiling Spirit Grammars)

  • More complicated implications if they have side-effects (which is frequently the case). Imagine backtracking from a parsed node when the semantic action had a side-effect: the parser state will be reverted, but the external effects aren't.

    In a way, using attributes only is like using deterministic, pure functions in a functional program, it is easier to reason about the correctness of a program (or, in this case the grammar state machine) when it is composed of pure functions only.

  • Semantic actions have a tendency (but not necessarily so) to introduce more copying around by value; this, in combination with heavy backtracking, could reduce performance. Of course, if the semantic action is 'heavy' this, in itself, is going to hinder performance of parsing.


Semantic actions are good for various purposes. In fact, if you need to parse non-trivial grammars with context sensitivity you cannot escape them.

  1. Consider the use of qi::locals<> and inherited attributes (code from the Mini XML - ASTs! sample) - they involve semantic actions:

    xml =
    start_tag [at_c<0>(_val) = _1]
    >> *node
    >> end_tag(at_c<0>(_val)) // passing the name from the
    // ... start_tag as inherited attribute
    ;

    Or one using qi::locals:

    rule<char const*, locals<char> > rl;
    rl = alpha[_a = _1] >> char_(_a); // get two identical characters
    test_parser("aa", rl); // pass
    test_parser("ax", rl); // fail

    IMO, these semantic action pose less of a problem usually, because when they get backtracked, the next time execution passes (the same) semantic action, the local will just get overwritten by the new, correct, value.

  2. Also, some jobs are really 'quick-and-dirty' and don't warrant the use of utree or a hand-rolled AST type:

     qi::phrase_parse(first, last, // imagine qi::istream_iterator... 
    intesting_string_pattern // we want to match certain patterns on the fly
    [ log_interesting_strings ], // and pass them to our logger
    noise_skipper // but we skip all noise
    );

    Here, the semantic action is the core of the parsers function. It works, because no backtracking is involved at the level of nodes with semantic actions.

  3. The semantic actions are a mirror-image of semantic actions in Spirit Karma, where they usually pose less of the problems than in Qi; so even if only for interface/API consistency, semantic actions are 'a good thing' and enhance the usability of Boost Spirit as a whole.

Boost::Spirit - Create class by semantic action (maybe by a C++ lambda?)

There's a lot that confuses me. You state "I want XYZ" without any reasoning why that would be preferable. Furthermore, you state the same for different approaches, so I don't know which one you prefer (and why).

The example code define

using stringvec = std::string;

This is confusing, because stringvec as a name suggests vector<string>, not string? Right now it looks more like CVar is "like a string" and your attribute is vector<CVar>, i.e. like a vector of string, just not std::string.

All in all I can give you the following hints:

  • in general, avoid semantic actions. They're heavy on the compiler, leaky abstractions, opt out of attribute compatibility¹, creates atomicity problems under backtracking (see Boost Spirit: "Semantic actions are evil"?)

  • secondly, if you use semantic actions, realize that the raw synthesized attribute for more parser expressions are Fusion sequences/containers.

    • In particular to get a std::string use qi::as_string[]. If you don't use semantic actions, indeed this kind of attribute compatibility/transformation is automatic¹.
    • in similar vein, to parse a single item into an explicit container, use repeat(1)[p]
  • the constructors work like you show with phx::construct<> except for all the downsides of relying on semantic actions

Side observation: did you notice I reduced parser/AST friction in this previous answer by replacing std::string with char?

Applied Answers:

Q. Actually, I would have liked to omit the vector surrounding the class, but I ran into problems http://coliru.stacked-crooked.com/a/3719cee9c7594254 I thought I managed to match a class as the root of the tree at some point. The error message looks, like the adapt struct really wants to call push_back at some point. And somehow, it makes sense, in case there is no class to store.

Simplified using as_string: Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;

using stringval = std::string;
struct CVar { stringval sVariable; };
BOOST_FUSION_ADAPT_STRUCT(CVar, sVariable)

struct TestGrammar : qi::grammar<Iterator, CVar()> {
TestGrammar() : TestGrammar::base_type(start) {
start = qi::as_string[qi::char_("a-z")];
}

private:
qi::rule<Iterator, CVar() > start;
};

void do_test(std::string const& input) {
CVar output;

static const TestGrammar p;

auto f = input.begin(), l = input.end();
qi::parse(f, l, p, output);

std::cout << std::quoted(input) << " -> " << std::quoted(output.sVariable) << "\n";
}

Using repeat(1): Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;

using stringval = std::string;
struct CVar { stringval sVariable; };
BOOST_FUSION_ADAPT_STRUCT(CVar, sVariable)

struct TestGrammar : qi::grammar<Iterator, CVar()> {
TestGrammar() : TestGrammar::base_type(start) {
cvar = qi::repeat(1)[qi::char_("a-z")];
start = cvar;
}

private:
qi::rule<Iterator, CVar()> start;
qi::rule<Iterator, std::string()> cvar;
};

void do_test(std::string const& input) {
CVar output;

static const TestGrammar p;

auto f = input.begin(), l = input.end();
qi::parse(f, l, p, output);

std::cout << std::quoted(input) << " -> " << std::quoted(output.sVariable) << "\n";
}

Without using ADAPT_STRUCT: minimal change vs: simplified

#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
using Iterator = std::string::const_iterator;

using stringval = std::string;
struct CVar {
CVar(std::string v = {}) : sVariable(std::move(v)) {}
stringval sVariable;
};

struct TestGrammar : qi::grammar<Iterator, CVar()> {
TestGrammar() : TestGrammar::base_type(start) {
start = qi::as_string[qi::lower];
}

private:
qi::rule<Iterator, CVar()> start;
};

Using Semantic Actions (not recommended): 4 modes live

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
using Iterator = std::string::const_iterator;

using stringval = std::string;
struct CVar {
CVar(std::string v = {}) : sVariable(std::move(v)) {}
stringval sVariable;
};

enum mode {
AS_STRING_CONSTRUCT = 1,
DIRECT_ASSIGN = 2,
USING_ACTOR = 3,
TRANSPARENT_CXX14_LAMBDA = 4,
};

struct TestGrammar : qi::grammar<Iterator, CVar()> {
TestGrammar(mode m) : TestGrammar::base_type(start) {
switch (m) {
case AS_STRING_CONSTRUCT: {
using namespace qi::labels;
start = qi::as_string[qi::lower][_val = px::construct<CVar>(_1)];
break;
}
case DIRECT_ASSIGN: {
// or directly
using namespace qi::labels;
start = qi::lower[_val = px::construct<std::string>(1ull, _1)];
break;
}
case USING_ACTOR: {
// or... indeed
using namespace qi::labels;
px::function as_cvar = [](char var) -> CVar { return {{var}}; };
start = qi::lower[_val = as_cvar(_1)];
break;
}
case TRANSPARENT_CXX14_LAMBDA: {
// or even more bespoke: (this doesn't require qi::labels or phoenix.hpp)
auto propagate = [](auto& attr, auto& ctx) {
at_c<0>(ctx.attributes) = {{attr}};
};
start = qi::lower[propagate];
break;
}
}
}

private:
qi::rule<Iterator, CVar()> start;
};

void do_test(std::string const& input, mode m) {
CVar output;

const TestGrammar p(m);

auto f = input.begin(), l = input.end();
qi::parse(f, l, p, output);

std::cout << std::quoted(input) << " -> " << std::quoted(output.sVariable) << "\n";
}

int main() {
for (mode m : {AS_STRING_CONSTRUCT, DIRECT_ASSIGN, USING_ACTOR,
TRANSPARENT_CXX14_LAMBDA}) {
std::cout << " ==== mode #" << static_cast<int>(m) << " === \n";
for (auto s : {"a", "d", "ac"})
do_test(s, m);
}
}

Just to demonstrate how the latter two approaches can both do without the constructor or even without any phoenix support.

As before, by that point I'd recommend going C++14 with Boost Spirit X3 anyways: http://coliru.stacked-crooked.com/a/dbd61823354ea8b6 or even 20 LoC: http://coliru.stacked-crooked.com/a/b26b3db6115c14d4


¹ there's a non-main "hack" that could help you there by defining BOOST_SPIRIT_ACTIONS_ALLOW_ATTR_COMPAT

How can I keep certain semantic actions out of the AST in boost::spirit::qi

What you are wanting to achieve is called error recover.

Unfortunately, Spirit does not have a nice way of doing it (there are also some internal decisions which makes it hard to make it externally). However, in your case it is simple to achieve by grammar rewrite.

#include <iostream>
#include <string>
#include <vector>
#include <boost/foreach.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/include/std_tuple.hpp>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace phx = boost::phoenix;

using V = std::tuple<std::string, double, double, double>;

namespace client {
template <typename Iterator>
struct VGrammar : qi::grammar<Iterator, std::vector<V>()> {
VGrammar() : VGrammar::base_type(start) {
using namespace qi;

v = skip(blank)[no_skip[string("v")] > double_ > double_ > double_];
junk = +(char_ - eol);
start = (v || -junk) % eol;

v.name("v");
junk.name("junk");
start.name("start");

using phx::val;
using phx::construct;

on_error<fail>(
start,
std::cout
<< val("Error! Expecting \n\n'")
<< qi::_4
<< val("'\n\n here: \n\n'")
<< construct<std::string>(qi::_3, qi::_2)
<< val("'")
<< std::endl
);

//debug(v);
//debug(junk);
//debug(start);
}

qi::rule<Iterator> junk;
//qi::rule<Iterator, qi::unused_type()> junk; // Doesn't work either
//qi::rule<Iterator, qi::unused_type(), qi::unused_type()> junk; // Doesn't work either
qi::rule<Iterator, V()> v;
qi::rule<Iterator, std::vector<V>()> start;
};
} // namespace client

int main(int argc, char* argv[]) {
using iterator_type = std::string::const_iterator;

std::string input = "";
input += "v 1 2 3\r"; // keep v 1 2 3
input += "o a b c\r"; // parse as junk
input += "v 4 5 6 v 7 8 9\r"; // keep v 4 5 6, but parse v 7 8 9 as junk
input += " v 10 11 12\r\r"; // parse as junk

iterator_type iter = input.begin();
const iterator_type end = input.end();
std::vector<V> parsed_output;
client::VGrammar<iterator_type> v_grammar;

std::cout << "run" << std::endl;
bool r = parse(iter, end, v_grammar, parsed_output);
std::cout << "done ... r: " << (r ? "true" : "false") << ", iter==end: " << ((iter == end) ? "true" : "false") << std::endl;

if (r && (iter == end)) {
BOOST_FOREACH(V const& v_row, parsed_output) {
std::cout << std::get<0>(v_row) << ", " << std::get<1>(v_row) << ", " << std::get<2>(v_row) << ", " << std::get<3>(v_row) << std::endl;
}
}

return EXIT_SUCCESS;
}

avoid construct template in boost spirit semantic action

This is the classic case where I always link to Boost Spirit: "Semantic actions are evil"?: avoid semantic actions.

In this case I don't know what your AST really looks like (what is node_key, where does key_typename come from etc.) so I can't really show you much.

Usually I'd adapt the node types and declare rules for the concrete node types. If that doesn't work, I prefer phoenix::function<> wrappers:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;

struct SomeComplicatedType {
enum Type { None, NewCall };
struct Key{};
SomeComplicatedType(Type = {}, Key = {}, std::string n = "") : name(std::move(n)) { }

std::string name;
};

static SomeComplicatedType::Key const s_default_key;

template <typename It>
struct Grammar : qi::grammar<It, SomeComplicatedType()>
{
Grammar() : Grammar::base_type(start) {
using namespace qi;
start = skip(space) [new_];
tyname = raw[(alpha|'_') >> +(alnum|'_')];

new_ = no_case["new"] > tyname [_val = make_new(_1)];

BOOST_SPIRIT_DEBUG_NODES((start)(new_)(tyname))
}
private:
qi::rule<It, SomeComplicatedType()> start;
qi::rule<It, SomeComplicatedType(), qi::space_type> new_;
qi::rule<It, std::string()> tyname;

struct make_complicated_t {
SomeComplicatedType::Type _type;

SomeComplicatedType operator()(std::string const& s) const {
return SomeComplicatedType{_type, s_default_key, s};
}
};
boost::phoenix::function<make_complicated_t> make_new { make_complicated_t{SomeComplicatedType::NewCall } };
};

int main() {
std::string const input = "new Sandwich";

SomeComplicatedType result;
if (parse(input.begin(), input.end(), Grammar<std::string::const_iterator>{}, result))
std::cout << "Parsed: " << result.name << "\n";

}

Prints

Parsed: Sandwich

How to capture the value parsed by a boost::spirit::x3 parser to be used within the body of a semantic action?

In X3, semantic actions are much simpler. They're unary callables that take just the context.

Then you use free functions to extract information from the context:

  • x3::_val(ctx) is like qi::_val
  • x3::_attr(ctx) is like qi::_0 (or qi::_1 for simple parsers)
  • x3::_pass(ctx) is like qi::_pass

So, to get your semantic action, you could do:

   auto qstring 
= x3::rule<struct rule_type, std::string> {"qstring"}
= x3::lexeme[quote > *("\\" >> x3::char_(quote) | ~x3::char_(quote)) > quote]
;

Now to make a very odd string rule that reverses the text (after de-escaping) and requires the number of characters to be an odd-number:

auto odd_reverse = [](auto& ctx) {
auto& attr = x3::_attr(ctx);
auto& val = x3::_val(ctx);
x3::traits::move_to(attr, val);
std::reverse(val.begin(), val.end());

x3::_pass(ctx) = val.size() % 2 == 0;
};

auto odd_string
= x3::rule<struct odd_type, std::string> {"odd_string"}
= qstring [ odd_reverse ]
;

DEMO

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

int main() {
namespace x3 = boost::spirit::x3;

auto constexpr quote = '"';
auto qstring
= x3::rule<struct rule_type, std::string> {"qstring"}
= x3::lexeme[quote > *("\\" >> x3::char_(quote) | ~x3::char_(quote)) > quote]
;

auto odd_reverse = [](auto& ctx) {
auto& attr = x3::_attr(ctx);
auto& val = x3::_val(ctx);
x3::traits::move_to(attr, val);
std::reverse(val.begin(), val.end());

x3::_pass(ctx) = val.size() % 2 == 0;
};

auto odd_string
= x3::rule<struct odd_type, std::string> {"odd_string"}
= qstring [ odd_reverse ]
;

for (std::string const input : {
R"("test \"hello\" world")",
R"("test \"hello\" world!")",
}) {
std::string output;
auto f = begin(input), l = end(input);
if (x3::phrase_parse(f, l, odd_string, x3::blank, output)) {
std::cout << "[" << output << "]\n";
} else {
std::cout << "Failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed: " << std::quoted(std::string(f,l)) << "\n";
}
}
}

Printing

[dlrow "olleh" tset]
Failed
Remaining unparsed: "\"test \\\"hello\\\" world!\""

UPDATE

To the added question:

EDIT: it seems that whenever I attach any semantic action in general
to the parser, the value is nullified. I suppose the question now is
how could I access the value before that happens? I just need to be
able to manipulate the parsed string before it is given to the AST.

Yes, if you attach an action, automatic attribute propagation is inhibited. This is the same in Qi, where you could assign rules with %= instead of = to force automatic attribute propagation.

To get the same effect in X3, use the third template argument to x3::rule: x3::rule<X, T, true> to indicate you want automatic propagation.

Really, try not to fight the system. In practice, the automatic transformation system is way more sophisticated than I am willing to re-discover on my own, so I usually post-process the whole AST or at most apply some minor tweaks in an action. See also Boost Spirit: "Semantic actions are evil"?

attaching semantic actions to parser with boost spirit

In spirit, a parser is a function object, and for the most part, the operators which are overloaded in order to allow you to make new parsers, such as >> and so on, return different function objects, rather than modifying the original.

If you ever used java and encountered java's immutable strings, you can think of it as a bit like that.

When you have an expression like

rule1 = lit("employee");
rule2 = (rule1 >> lit(",") >> rule1) [ &print ];

what is happening is that a new parser object is produced and assigned to variable rule2, and that parser object has the semantic action attached.

In fact there is a new temporary parser object for each operator in the expression. The overhead is only once when the parser is constructed, it doesn't really matter at parse time.

When you have

start[&print];

this is like producing a temporary value that is immediately discarded. It does not have side effects for the value in the start variable. That's why print is never called.

If it didn't work this way, then it would be a lot more complicated to make grammars, potentially.

When you define a grammar as in spirit qi, usually the definition is basically done in the constructor of the grammar object. First the prototypes of the rules are given, specifying their types, skippers, etc. Then you construct the rules one by one. You have to make sure that you don't use a rule in the definition of another rule before it is initialized. But after it is initialized, it mostly won't change as far as the grammar in concerned. (You can modify things like debug info though.)

If all the rules could potentially change after being initialized, then they would all have to update eachother about the changes and that would be more complicated.

You might imagine that that is avoided by having the rules store references to eachother, rather than values. But that implies pointers and dynamic allocations afaik, and would be slower. Part of the point in spirit is that it is expression templates -- all those "pointer dereferences" are supposed to get resolved at compile time, as I understand.



Related Topics



Leave a reply



Submit