Boost Spirit Semantic Action Parameters

boost spirit semantic action parameters

This a really good question (and also a can of worms) because it gets at the interface of qi and phoenix. I haven't seen an example either, so I'll extend the article a little in this direction.

As you say, functions for semantic actions can take up to three parameters

  1. Matched attribute - covered in the article
  2. Context - contains the qi-phoenix interface
  3. Match flag - manipulate the match state

Match flag

As the article states, the second parameter is not meaningful unless the expression is part of a rule, so lets start with the third. A placeholder for the second parameter is still needed though and for this use boost::fusion::unused_type. So a modified function from the article to use the third parameter is:

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <iostream>

void f(int attribute, const boost::fusion::unused_type& it, bool& mFlag){
//output parameters
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;

//fiddle with match flag
mFlag = false;
}

namespace qi = boost::spirit::qi;

int main(void){
std::string input("1234 6543");
std::string::const_iterator begin = input.begin(), end = input.end();

bool returnVal = qi::phrase_parse(begin, end, qi::int_[f], qi::space);

std::cout << "return: " << returnVal << std::endl;
return 0;
}

which outputs:


matched integer: '1234'
match flag: 1
return: 0

All this example does is switch the match to a non-match, which is reflected in the parser output. According to hkaiser, in boost 1.44 and up setting the match flag to false will cause the match to fail in the normal way. If alternatives are defined, the parser will backtrack and attempt to match them as one would expect. However, in boost<=1.43 a Spirit bug prevents backtracking, which causes strange behavior. To see this, add phoenix include boost/spirit/include/phoenix.hpp and change the expression to

qi::int_[f] | qi::digit[std::cout << qi::_1 << "\n"]

You'd expect that, when the qi::int parser fails, the alternative qi::digit to match the beginning of the input at "1", but the output is:


matched integer: '1234'
match flag: 1
6
return: 1

The 6 is the first digit of the second int in the input which indicates the alternative is taken using the skipper and without backtracking. Notice also that the match is considered succesful, based on the alternative.

Once boost 1.44 is out, the match flag will be useful for applying match criteria that might be otherwise difficult to express in a parser sequence. Note that the match flag can be manipulated in phoenix expressions using the _pass placeholder.

Context parameter

The more interesting parameter is the second one, which contains the qi-phoenix interface, or in qi parlance, the context of the semantic action. To illustrate this, first examine a rule:

rule<Iterator, Attribute(Arg1,Arg2,...), qi::locals<Loc1,Loc2,...>, Skipper>

The context parameter embodies the Attribute, Arg1, ... ArgN, and qi::locals template paramters, wrapped in a boost::spirit::context template type. This attribute differs from the function parameter: the function parameter attribute is the parsed value, while this attribute is the value of the rule itself. A semantic action must map the former to the latter. Here's an example of a possible context type (phoenix expression equivalents indicated):

using namespace boost;
spirit::context< //context template
fusion::cons<
int&, //return int attribute (phoenix: _val)
fusion::cons<
char&, //char argument1 (phoenix: _r1)
fusion::cons<
float&, //float argument2 (phoenix: _r2)
fusion::nil //end of cons list
>,
>,
>,
fusion::vector2< //locals container
char, //char local (phoenix: _a)
unsigned int //unsigned int local (phoenix: _b)
>
>

Note the return attribute and argument list take the form of a lisp-style list (a cons list). To access these variables within a function, access the attribute or locals members of the context struct template with fusion::at<>(). For example, for a context variable con

//assign return attribute
fusion::at_c<0>(con.attributes) = 1;

//get the second rule argument
float arg2 = fusion::at_c<2>(con.attributes);

//assign the first local
fusion::at_c<1>(con.locals) = 42;

To modify the article example to use the second argument, change the function definition and phrase_parse calls:

...
typedef
boost::spirit::context<
boost::fusion::cons<int&, boost::fusion::nil>,
boost::fusion::vector0<>
> f_context;
void f(int attribute, const f_context& con, bool& mFlag){
std::cout << "matched integer: '" << attribute << "'" << std::endl
<< "match flag: " << mFlag << std::endl;

//assign output attribute from parsed value
boost::fusion::at_c<0>(con.attributes) = attribute;
}
...
int matchedInt;
qi::rule<std::string::const_iterator,int(void),ascii::space_type>
intRule = qi::int_[f];
qi::phrase_parse(begin, end, intRule, ascii::space, matchedInt);
std::cout << "matched: " << matchedInt << std::endl;
....

This is a very simple example that just maps the parsed value to the output attribute value, but extensions should be fairly apparent. Just make the context struct template parameters match the rule output, input, and local types. Note that this type of a direct match between parsed type/value to output type/value can be done automatically using auto rules, with a %= instead of a = when defining the rule:

qi::rule<std::string::const_iterator,int(void),ascii::space_type> 
intRule %= qi::int_;

IMHO, writing a function for each action would be rather tedious, compared to the brief and readable phoenix expression equivalents. I sympathize with the voodoo viewpoint, but once you work with phoenix for a little while, the semantics and syntax aren't terribly difficult.

Edit: Accessing rule context w/ Phoenix

The context variable is only defined when the parser is part of a rule. Think of a parser as being any expression that consumes input, where a rule translates the parser values (qi::_1) into a rule value (qi::_val). The difference is often non-trivial, for example when qi::val has a Class type that needs to be constructed from POD parsed values. Below is a simple example.

Let's say part of our input is a sequence of three CSV integers (x1, x2, x3), and we only care out an arithmetic function of these three integers (f = x0 + (x1+x2)*x3 ), where x0 is a value obtained elsewhere. One option is to read in the integers and calculate the function, or alternatively use phoenix to do both.

For this example, use one rule with an output attribute (the function value), and input (x0), and a local (to pass information between individual parsers with the rule). Here's the full example.

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <string>
#include <iostream>

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

int main(void){
std::string input("1234, 6543, 42");
std::string::const_iterator begin = input.begin(), end = input.end();

qi::rule<
std::string::const_iterator,
int(int), //output (_val) and input (_r1)
qi::locals<int>, //local int (_a)
ascii::space_type
>
intRule =
qi::int_[qi::_a = qi::_1] //local = x1
>> ","
>> qi::int_[qi::_a += qi::_1] //local = x1 + x2
>> ","
>> qi::int_
[
qi::_val = qi::_a*qi::_1 + qi::_r1 //output = local*x3 + x0
];

int ruleValue, x0 = 10;
qi::phrase_parse(begin, end, intRule(x0), ascii::space, ruleValue);
std::cout << "rule value: " << ruleValue << std::endl;
return 0;
}

Alternatively, all the ints could be parsed as a vector, and the function evaluated with a single semantic action (the % below is the list operator and elements of the vector are accessed with phoenix::at):

namespace ph = boost::phoenix;
...
qi::rule<
std::string::const_iterator,
int(int),
ascii::space_type
>
intRule =
(qi::int_ % ",")
[
qi::_val = (ph::at(qi::_1,0) + ph::at(qi::_1,1))
* ph::at(qi::_1,2) + qi::_r1
];
....

For the above, if the input is incorrect (two ints instead of three), bad thing could happen at run time, so it would be better to specify the number of parsed values explicitly, so parsing will fail for a bad input. The below uses _1, _2, and _3 to reference the first, second, and third match value:

(qi::int_ >> "," >> qi::int_ >> "," >> qi::int_)
[
qi::_val = (qi::_1 + qi::_2) * qi::_3 + qi::_r1
];

This is a contrived example, but should give you the idea. I've found phoenix semantic actions really helpful in constructing complex objects directly from input; this is possible because you can call constructors and member functions within semantic actions.

How to get a function result in a Boost.Spirit semantic action

Semantic actions are "deferred actors". Meaning: they are function objects that describe a function call, they are not invoked during rule definition.

So you can use

  • phoenix::bind
  • phoenix::function
  • write a semantic action function

Going with the phoenix bind, as it is closest to your code:

roll = (qi::int_ >> 'd' >> qi::int_)
[ _val = px::bind(::roll, _1, _2) ] ;
  1. Note how I removed the use of local variables. They would have been UB because they don't exist after the constructor completes!
  2. Note also that I needed to disambiguate ::roll with a global namespace qualification, because the roll rule member shadows it.

Live Demo

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

static int roll_dice(int num, int faces);

namespace Parser {
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;

//calculator grammar
template <typename Iterator>
struct calculator : qi::grammar<Iterator, int()> {
calculator() : calculator::base_type(start) {
using namespace qi::labels;

start = qi::skip(qi::space) [ expression ];

roll = (qi::int_ >> 'd' >> qi::int_)
[ _val = px::bind(::roll_dice, _1, _2) ] ;

expression =
term [_val = _1]
>> *( ('+' >> term [_val += _1])
| ('-' >> term [_val -= _1])
)
;

term =
factor [_val = _1]
>> *( ('*' >> factor [_val *= _1])
| ('/' >> factor [_val /= _1])
)
;

factor
= roll [_val = _1]
| qi::uint_ [_val = _1]
| '(' >> expression [_val = _1] >> ')'
| ('-' >> factor [_val = -_1])
| ('+' >> factor [_val = _1])
;

BOOST_SPIRIT_DEBUG_NODES((start)(roll)(expression)(term)(factor))
}

private:
qi::rule<Iterator, int()> start;
qi::rule<Iterator, int(), qi::space_type> roll, expression, term, factor;
};
}

#include <random>
#include <iomanip>

static int roll_dice(int num, int faces) {
static std::mt19937 gen{std::random_device{}()};
int res=0;
std::uniform_int_distribution<> dist{1, faces};
for(int i=0; i<num; i++) {
res+=dist(gen);
}
return res;
}

int main() {
using It = std::string::const_iterator;
Parser::calculator<It> const calc;

for (std::string const& str : {
"42",
"2*(2d5+3d7)",
})
{
auto f = str.begin(), l = str.end();

int result;
if (parse(f, l, calc, result)) {
std::cout << "result = " << result << std::endl;
} else {
std::cout << "Parsing failed\n";
}

if (f != l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
}

Prints, e.g.

result = 42
result = 38

BUGS!

Correctness first. You probably didn't realize but uniform_int_distribution<>(a,b) leads to UB¹ if b<a.

Similar when someone types -7d5.

You need to add the checks:

static int roll_dice(int num, int faces) {
if (num < 0) throw std::range_error("num");
if (faces < 1) throw std::range_error("faces");

int res = 0;
static std::mt19937 gen{ std::random_device{}() };
std::uniform_int_distribution<> dist{ 1, faces };
for (int i = 0; i < num; i++) {
res += dist(gen);
}
std::cerr << "roll_dice(" << num << ", " << faces << ") -> " << res << "\n";
return res;
}

Defensive Programming is a must in any domain/language. In C++ it protects against Nasal Demons

Generalize!

That code has been simplified considerably, and I added the necessary plumbing to get debug output:

<start>
<try>2*(2d5+3d7)</try>
<expression>
<try>2*(2d5+3d7)</try>
<term>
<try>2*(2d5+3d7)</try>
<factor>
<try>2*(2d5+3d7)</try>
<roll>
<try>2*(2d5+3d7)</try>
<fail/>
</roll>
<success>*(2d5+3d7)</success>
<attributes>[2]</attributes>
</factor>
<factor>
<try>(2d5+3d7)</try>
<roll>
<try>(2d5+3d7)</try>
<fail/>
</roll>
<expression>
<try>2d5+3d7)</try>
<term>
<try>2d5+3d7)</try>
<factor>
<try>2d5+3d7)</try>
<roll>
<try>2d5+3d7)</try>
<success>+3d7)</success>
<attributes>[9]</attributes>
</roll>
<success>+3d7)</success>
<attributes>[9]</attributes>
</factor>
<success>+3d7)</success>
<attributes>[9]</attributes>
</term>
<term>
<try>3d7)</try>
<factor>
<try>3d7)</try>
<roll>
<try>3d7)</try>
<success>)</success>
<attributes>[10]</attributes>
</roll>
<success>)</success>
<attributes>[10]</attributes>
</factor>
<success>)</success>
<attributes>[10]</attributes>
</term>
<success>)</success>
<attributes>[19]</attributes>
</expression>
<success></success>
<attributes>[19]</attributes>
</factor>
<success></success>
<attributes>[38]</attributes>
</term>
<success></success>
<attributes>[38]</attributes>
</expression>
<success></success>
<attributes>[38]</attributes>
</start>
result = 38

Now, let's look at the grammar conceptually. Really, d is just a binary infix operator, like 3+7 or 3d7. So, if we assume that it has the same precedence as the unary plus/minus, we can simplify the rules while making the grammar much more general:

factor = (qi::uint_         [_val = _1]
| '(' >> expression [_val = _1] >> ')'
| ('-' >> factor [_val = -_1])
| ('+' >> factor [_val = _1])
) >> *(
'd' >> factor [_val = px::bind(::roll_dice, _val, _1)]
)
;

Whoop! No more roll rule. Also, suddenly the following become valid inputs:

1*3d(5+2)
(3+9*3)d8
0*0d5
(3d5)d15
1d(15d3)
(1d1d1d1) * 42

Full Demo

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

static int roll_dice(int num, int faces);

namespace Parser {
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;

//calculator grammar
template <typename Iterator>
struct calculator : qi::grammar<Iterator, int()> {
calculator() : calculator::base_type(start) {
using namespace qi::labels;

start = qi::skip(qi::space) [ expression ];

expression =
term [_val = _1]
>> *( ('+' >> term [_val += _1])
| ('-' >> term [_val -= _1])
)
;

term =
factor [_val = _1]
>> *( ('*' >> factor [_val *= _1])
| ('/' >> factor [_val /= _1])
)
;

factor = (qi::uint_ [_val = _1]
| '(' >> expression [_val = _1] >> ')'
| ('-' >> factor [_val = -_1])
| ('+' >> factor [_val = _1])
) >> *(
'd' >> factor [_val = px::bind(::roll_dice, _val, _1)]
)
;

BOOST_SPIRIT_DEBUG_NODES((start)(expression)(term)(factor))
}

private:
qi::rule<Iterator, int()> start;
qi::rule<Iterator, int(), qi::space_type> expression, term, factor;
};
}

#include <random>
#include <iomanip>

static int roll_dice(int num, int faces) {
if (num < 0) throw std::range_error("num");
if (faces < 1) throw std::range_error("faces");

int res = 0;
static std::mt19937 gen{ std::random_device{}() };
std::uniform_int_distribution<> dist{ 1, faces };
for (int i = 0; i < num; i++) {
res += dist(gen);
}
std::cerr << "roll_dice(" << num << ", " << faces << ") -> " << res << "\n";
return res;
}

int main() {
using It = std::string::const_iterator;
Parser::calculator<It> const calc;

for (std::string const& input : {
"42",
"2*(2d5+3d7)",
// generalized
"1*3d(5+2)",
"(3+9*3)d8",
"0*0d5",
"(3d5)d15",
"1d(15d3)",
"(1d1d1d1) * 42",
})
{
std::cout << "\n==== Parsing " << std::quoted(input) << "\n";
auto f = input.begin(), l = input.end();

int result;
if (parse(f, l, calc, result)) {
std::cout << "Parse result = " << result << std::endl;
} else {
std::cout << "Parsing failed\n";
}

if (f != l) {
std::cout << "Remaining input: " << std::quoted(std::string(f, l)) << "\n";
}
}
}

Prints


¹ don't you love c++?

Sample Image

¹ don't you love c++?

boost spirit reporting semantic error

I'd use filepos_iterator and just throw an exception, so you have complete control over the reporting.

Let me see what I can come up with in the remaining 15 minutes I have

Ok, took a little bit more time but think it's an instructive demo:

Live On Coliru

#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/support_line_pos_iterator.hpp>
#include <boost/spirit/repository/include/qi_iter_pos.hpp>
#include <boost/lexical_cast.hpp>

namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
namespace px = boost::phoenix;
namespace qi_coding = boost::spirit::ascii;
using It = boost::spirit::line_pos_iterator<std::string::const_iterator>;

namespace ast {
enum actionid { f_unary, f_binary };
enum param_type { int_param, string_param };

static inline std::ostream& operator<<(std::ostream& os, actionid id) {
switch(id) {
case f_unary: return os << "f_unary";
case f_binary: return os << "f_binary";
default: return os << "(unknown)";
} }
static inline std::ostream& operator<<(std::ostream& os, param_type t) {
switch(t) {
case int_param: return os << "integer";
case string_param: return os << "string";
default: return os << "(unknown)";
} }


using param_value = boost::variant<int, std::string>;
struct parameter {
It position;
param_value value;

friend std::ostream& operator<<(std::ostream& os, parameter const& p) { return os << p.value; }
};
using parameters = std::vector<parameter>;

struct action {
/*
*action() = default;
*template <typename Sequence> action(Sequence const& seq) { boost::fusion::copy(seq, *this); }
*/
actionid id;
parameters params;
};
}

namespace std {
static inline std::ostream& operator<<(std::ostream& os, ast::parameters const& v) {
std::copy(v.begin(), v.end(), std::ostream_iterator<ast::parameter>(os, " "));
return os;
}
}

BOOST_FUSION_ADAPT_STRUCT(ast::action, id, params)
BOOST_FUSION_ADAPT_STRUCT(ast::parameter, position, value)

struct BadAction : std::exception {
It _where;
std::string _what;
BadAction(It it, std::string msg) : _where(it), _what(std::move(msg)) {}
It where() const { return _where; }
char const* what() const noexcept { return _what.c_str(); }
};

struct ValidateAction {
std::map<ast::actionid, std::vector<ast::param_type> > const specs {
{ ast::f_unary, { ast::int_param } },
{ ast::f_binary, { ast::int_param, ast::string_param } },
};

ast::action operator()(It source, ast::action parsed) const {
auto check = [](ast::parameter const& p, ast::param_type expected_type) {
if (p.value.which() != expected_type) {
auto name = boost::lexical_cast<std::string>(expected_type);
throw BadAction(p.position, "Type mismatch (expecting " + name + ")");
}
};

int i;
try {
auto& formals = specs.at(parsed.id);
auto& actuals = parsed.params;
auto arity = formals.size();

for (i=0; i<arity; ++i)
check(actuals.at(i), formals.at(i));

if (actuals.size() > arity)
throw BadAction(actuals.at(arity).position, "Excess parameters");
} catch(std::out_of_range const&) {
throw BadAction(source, "Missing parameter #" + std::to_string(i+1));
}
return parsed;
}
};

template <typename It, typename Skipper = qi::space_type>
struct Parser : qi::grammar<It, ast::action(), Skipper> {
Parser() : Parser::base_type(start) {
using namespace qi;
parameter = qr::iter_pos >> (int_ | lexeme['"' >> *~qi_coding::char_('"') >> '"']);
parameters = -(parameter % ',');
action = actions_ >> '(' >> parameters >> ')';
start = (qr::iter_pos >> action) [ _val = validate_(_1, _2) ];

BOOST_SPIRIT_DEBUG_NODES((parameter)(parameters)(action))
}
private:
qi::rule<It, ast::action(), Skipper> start, action;
qi::rule<It, ast::parameters(), Skipper> parameters;
qi::rule<It, ast::parameter(), Skipper> parameter;
px::function<ValidateAction> validate_;

struct Actions : qi::symbols<char, ast::actionid> {
Actions() { this->add("f_unary", ast::f_unary)("f_binary", ast::f_binary); }
} actions_;

};

int main() {
for (std::string const input : {
// good
"f_unary( 0 )",
"f_binary ( 47, \"hello\")",
// errors
"f_binary ( 47, \"hello\") bogus",
"f_unary ( 47, \"hello\") ",
"f_binary ( 47, \r\n 7) ",
})
{
std::cout << "-----------------------\n";
Parser<It> p;
It f(input.begin()), l(input.end());

auto printErrorContext = [f,l](std::ostream& os, It where) {
auto line = get_current_line(f, where, l);

os << " line:" << get_line(where)
<< ", col:" << get_column(line.begin(), where) << "\n";
while (!line.empty() && std::strchr("\r\n", *line.begin()))
line.advance_begin(1);
std::cerr << line << "\n";
std::cerr << std::string(std::distance(line.begin(), where), ' ') << "^ --- here\n";
};

ast::action data;
try {
if (qi::phrase_parse(f, l, p > qi::eoi, qi::space, data)) {
std::cout << "Parsed: " << boost::fusion::as_vector(data) << "\n";
}
} catch(qi::expectation_failure<It> const& e) {
printErrorContext(std::cerr << "Expectation failed: " << e.what_, e.first);
} catch(BadAction const& ba) {
printErrorContext(std::cerr << "BadAction: " << ba.what(), ba.where());
}

if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
}

Printing:

-----------------------
Parsed: (f_unary 0 )
-----------------------
Parsed: (f_binary 47 hello )
-----------------------
Expectation failed: <eoi> line:1, col:25
f_binary ( 47, "hello") bogus
^ --- here
Remaining unparsed: 'f_binary ( 47, "hello") bogus'
-----------------------
BadAction: Excess parameters line:1, col:15
f_unary ( 47, "hello")
^ --- here
Remaining unparsed: 'f_unary ( 47, "hello") '
-----------------------
BadAction: Type mismatch (expecting string) line:2, col:8
7)
^ --- here
Remaining unparsed: 'f_binary ( 47,
7) '

Using semantic action together with attribute propagation in spirit

There is BOOST_SPIRIT_ACTIONS_ALLOW_ATTR_COMPAT which is supposed to allow attribute compatibility rules to work inside semantic actions like they work during automation attribute propagation.

However, the superior solution is to specify the conversions you wish, when you wish them.

The most obvious approaches are

  • wrap the intermediate into a qi::rule<..., T()>

    Incidentally, I already solved your particular issue that way here boost spirit reporting semantic error in your previous question.

    Actually, I suppose you would like to have a stateful validator working on the fly, and you can use Attribute Traits to transform your intermediates to the desired AST (e.g. if you don't want to actually store the iterators in your AST)

  • wrap the sub-expression in a qi::transform_attribute<T>()[p] directive.

    Beware of a bug in some versions of Boost Spirit that requires you to explicitly deep-copy the subexpression in transform_attribute (use qi::copy(p))

attaching semantic actions to parser with boost spirit

In spirit, a parser is a function object, and for the most part, the operators which are overloaded in order to allow you to make new parsers, such as >> and so on, return different function objects, rather than modifying the original.

If you ever used java and encountered java's immutable strings, you can think of it as a bit like that.



Related Topics



Leave a reply



Submit