Constraining the Existing Boost.Spirit Real_Parser (With a Policy)

Constraining the existing Boost.Spirit real_parser (with a policy)

It seems I am so close, i.e. just a few changes to the double_ parser and I'd be done. This would probably be a lot more maintainable than adding a new grammar, since all the other parsing is done that way. – toting 7 hours ago

Even more maintainable would be to not write another parser at all.

You basically want to parse a floating point numbers (Spirit has got you covered) but apply some validations afterward. I'd do the validations in a semantic action:

raw [ double_ [_val = _1] ] [ _pass = !isnan_(_val) && px::size(_1)<=4 ]

That's it.

Explanations

Anatomy:

  • double_ [_val = _1] parses a double and assigns it to the exposed attribute as usual¹
  • raw [ parser ] matches the enclosed parser but exposes the raw source iterator range as an attribute
  • [ _pass = !isnan_(_val) && px::size(_1)<=4 ] - the business part!

    This semantic action attaches to the raw[] parser. Hence

    • _1 now refers to the raw iterator range that already parsed the double_
    • _val already contains the "cooked" value of a successful match of double_
    • _pass is a Spirit context flag that we can set to false to make parsing fail.

Now the only thing left is to tie it all together. Let's make a deferred version of ::isnan:

boost::phoenix::function<decltype(&::isnan)> isnan_(&::isnan);

We're good to go.

Test Program

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <cmath>
#include <iostream>

int main ()
{
using It = std::string::const_iterator;

auto my_fpnumber = [] { // TODO encapsulate in a grammar struct
using namespace boost::spirit::qi;
using boost::phoenix::size;

static boost::phoenix::function<decltype(&::isnan)> isnan_(&::isnan);

return rule<It, double()> (
raw [ double_ [_val = _1] ] [ _pass = !isnan_(_val) && size(_1)<=4 ]
);
}();

for (std::string const s: { "1.23", ".123", "2.e6", "inf", "3.2323", "nan" })
{
It f = s.begin(), l = s.end();

double result;
if (parse(f, l, my_fpnumber, result))
std::cout << "Parse success: '" << s << "' -> " << result << "\n";
else
std::cout << "Parse rejected: '" << s << "' at '" << std::string(f,l) << "'\n";
}
}

Prints

Parse success:  '1.23' -> 1.23
Parse success: '.123' -> 0.123
Parse success: '2.e6' -> 2e+06
Parse success: 'inf' -> inf
Parse rejected: '3.2323' at '3.2323'
Parse rejected: 'nan' at 'nan'

¹ The assignment has to be done explicitly here because we use semantic actions and they normally suppress automatic attribute propagation

Boost spirit: Invalidate parser from member function

There are two ways:

  • you can assign to qi::_val using Phoenix actors
  • you can assign to the third parameter (bool&) inside a "raw" semantic action function

An example is here:

  • Constraining the existing Boost.Spirit real_parser (with a policy) (using _val)

The anatomy of a semantic action function (with the third argument):

  • boost spirit semantic action parameters

In your case you have a member function with roughly the "raw semantic action function" signature. Of course, you'll have to bind for the this parameter (because it's a non-static member function).

Note that in this particular case, phoenix::bind is not the right bind to use, as Phoenix Actors will be considered to be "cooked" (not raw) semantic actions, and they will get executed in the Spirit context.

You could either

  1. use boost::bind (or even std::bind) to bind into a function that preserves the arity (!) of the member function:

    [boost::bind(&moduleCommandParser::isModule, this, ::_1, ::_2, ::_3)]

    This works: Live On Coliru

  2. instead use a "cooked" semantic action, directly assigning to the _pass context placeholder:

    [qi::_pass = phoenix::bind(&moduleAccessManager::getModule, m_acm, qi::_1)]

    This works too: Live On Coliru

The latter example, for future reference:

#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

#include <iostream>
#include <string>

namespace qi = boost::spirit::qi;
namespace phoenix = boost::phoenix;

class moduleAccessManager {
public:
bool getModule(const std::string name) {
return name == "cat" || name == "dog";
}
};

void globalIsModule(std::string moduleName, const boost::spirit::unused_type&, bool& mFlag)
{
moduleAccessManager acm; /* Dirty workaround for this example */
if(acm.getModule(moduleName))
std::cout << "[isModule] Info: Found module with name >" << moduleName << "<" << std::endl;
else
{
std::cout << "[isModule] Error: No module with name >" << moduleName << "<" << std::endl;
mFlag = false; // No valid module name
}
}

template <typename Iterator, typename Skipper>
class moduleCommandParser : public qi::grammar<Iterator, Skipper>
{
private:
moduleAccessManager* m_acm;

qi::rule<Iterator, Skipper> start, module;

public:
std::string m_moduleName;

moduleCommandParser(moduleAccessManager* acm)
: moduleCommandParser::base_type(start)
, m_acm(acm)
, m_moduleName("<empty>")
{
using namespace phoenix::arg_names;
module = qi::as_string[qi::lexeme[+(~qi::char_(' '))]]
[qi::_pass = phoenix::bind(&moduleAccessManager::getModule, m_acm, qi::_1)]
;
start = module >> qi::as_string[+(~qi::char_('\n'))];
};

};

int main()
{
moduleAccessManager acm;
moduleCommandParser<std::string::const_iterator, qi::space_type> commandGrammar(&acm);

std::string str;
std::string::const_iterator first;
std::string::const_iterator last;

str = "cat run";
first = str.begin();
last = str.end();
std::cout << str << std::boolalpha
<< qi::phrase_parse(first, last, commandGrammar, qi::space)
<< "\n";

str = "bird fly";
first = str.begin();
last = str.end();
std::cout << str << std::boolalpha
<< qi::phrase_parse(first, last, commandGrammar, qi::space)
<< "\n";
}

Can I test a parsed number as part of the rule. int_ = 120

This correctly validates for input like 23:59:59 and fails for input like 24:00:00.

bool validTime = qi::parse(f, l, uint2_p[ _pass = _1<24] >> ":" >> uint2_p[ _pass = _1<60] >> ":" >> uint2_p[ _pass = _1<60]);

Thanks for taking the time to look at my question.

Boost Spirit Qi validating input parser

You can indeed use semantic actions. You don't always need to attach them to an eps node, though. Here's what you'd get if you do:

port %= uint_parser<uint16_t, 10, 2, 5>() >> eps[ _pass = (_val>=10 && _val<=65535) ];
start = (port >> -('-' >> port)) >> eps(validate(_val));

Note that the one rule uses Simple Form eps with semantic action attached. This requires operator%= to still invoke automatic attribute propagation.

The second instance uses the Semantic Predicate form of eps. The validate function needs to be a Phoenix Actor, I defined it like:

struct validations {
bool operator()(PortRange const& range) const {
if (range.end)
return range.start<*range.end;
return true;
}
};
boost::phoenix::function<validations> validate;

More Generic/Consistent

Note you can use the second rule style on both rules like so:

port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
start = (port >> -('-' >> port)) >> eps(validate(_val));

if you simply add an overload to validate a single port:

struct validations {
bool operator()(Port const& port) const {
return port>=10 && port<=65535;
}
bool operator()(PortRange const& range) const {
if (range.end)
return range.start<*range.end;
return true;
}
};

First Tests

Let's define some nice edge cases and test them!

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/optional/optional_io.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;

using Port = std::uint16_t;

struct PortRange {
Port start;
boost::optional<Port> end;
};

BOOST_FUSION_ADAPT_STRUCT(PortRange, start, end)

template <class It, typename Attr = PortRange> struct port_range_grammar : qi::grammar<It, Attr()> {

port_range_grammar() : port_range_grammar::base_type(start, "port_range") {
using namespace qi;

port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
start = (port >> -('-' >> port)) >> eps(validate(_val));

port.name("valid port range: (10, 65535)");
}

private:
struct validations {
bool operator()(Port const& port) const {
return port>=10 && port<=65535;
}
bool operator()(PortRange const& range) const {
if (range.end)
return range.start<*range.end;
return true;
}
};
boost::phoenix::function<validations> validate;
qi::rule<It, Attr()> start;
qi::rule<It, Port()> port;
};

int main() {
using It = std::string::const_iterator;
port_range_grammar<It> const g;

std::string const valid[] = {"10", "6322", "6322-6325", "65535"};
std::string const invalid[] = {"9", "09", "065535", "65536", "-1", "6325-6322"};

std::cout << " -------- valid cases\n";
for (std::string const input : valid) {
It f=input.begin(), l = input.end();
PortRange range;
bool accepted = parse(f, l, g, range);
if (accepted)
std::cout << "Parsed '" << input << "' to " << boost::fusion::as_vector(range) << "\n";
else
std::cout << "TEST FAILED '" << input << "'\n";
}

std::cout << " -------- invalid cases\n";
for (std::string const input : invalid) {
It f=input.begin(), l = input.end();
PortRange range;
bool accepted = parse(f, l, g, range);
if (accepted)
std::cout << "TEST FAILED '" << input << "' (returned " << boost::fusion::as_vector(range) << ")\n";
}
}

Prints:

 -------- valid cases
Parsed '10' to (10 --)
Parsed '6322' to (6322 --)
Parsed '6322-6325' to (6322 6325)
Parsed '65535' to (65535 --)
-------- invalid cases
TEST FAILED '065535' (returned (6553 --))

CONGRATULATIONS We found a broken edge case

Turns out that by limiting uint_parser to 5 positions, we may leave characters in the input, so that 065535 parses as 6553 (leaving '5' unparsed...). Fixing that is simple:

start = (port >> -('-' >> port)) >> eoi >> eps(validate(_val));

Or indeed:

start %= (port >> -('-' >> port)) >> eoi[ _pass = validate(_val) ];

Fixed version Live On Coliru

A Few Words About The Attribute Type

You will have noticed I revised your attribute type. Most of this is "good taste". Note, in practice you might want to represent your range as either single-port or range:

using Port = std::uint16_t;

struct PortRange {
Port start, end;
};

using PortOrRange = boost::variant<Port, PortRange>;

Which you would then parse like:

port %= uint_parser<Port, 10, 2, 5>() >> eps(validate(_val));
range = (port >> '-' >> port) >> eps(validate(_val));

start = (range | port) >> eoi;

Full demo Live On Coliru

You might think this will get unweildy to use. I agree!

Simplify Instead

Let's do without variant or optional in the first place. Let's make a single port just a range which happens to have start==end:

using Port = std::uint16_t;

struct PortRange {
Port start, end;
};

Parse it like:

start = port >> -('-' >> port | attr(0)) >> eoi >> eps(validate(_val));

All we do in validate is to check whether end is 0:

    bool operator()(PortRange& range) const {
if (range.end == 0)
range.end = range.start;
return range.start <= range.end;
}

And now the output is: Live On Coliru

 -------- valid cases
Parsed '10' to (10-10)
Parsed '6322' to (6322-6322)
Parsed '6322-6325' to (6322-6325)
Parsed '65535' to (65535-65535)
-------- invalid cases

Note how you can now always enumerate start..end without knowing whether there was a port or a port-range. This may be convenient (depending a bit on the logic you're implementing).

Boost.Spirit.Qi - Bounds checking against primitive data types

You have missed the fact that raw[] exposes an iterator range. The other answer used that because the "extra" constraint was referring to the input length (in characters).

You don't need that, so you'd rather use something direct like:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;

int main ()
{
using It = std::string::const_iterator;
qi::rule<It, double()> r
= qi::double_ [ qi::_pass = (qi::_1 < 256.0), qi::_val = qi::_1 ];

for (std::string const s: { "1.23", ".123", "2.e6", "inf", "-inf", "3.2323", "nan" })
{
It f = s.begin(), l = s.end();

double result;
if (parse(f, l, r, result))
std::cout << "accepted: '" << s << "' -> " << result;
else std::cout << "rejected: '" << s << "'";

if (f!=l)
std::cout << " (remaining: '" << std::string(f,l) << "')\n";
else std::cout << "\n";
}
}

Prints

accepted: '1.23' -> 1.23
accepted: '.123' -> 0.123
rejected: '2.e6' (remaining: '2.e6')
rejected: 'inf' (remaining: 'inf')
accepted: '-inf' -> -inf
accepted: '3.2323' -> 3.2323
rejected: 'nan' (remaining: 'nan')

Notes:

  1. the [action1, action2] is the Phoenix way of supplying multiple statements (in this case would be very similar to [action1][action2]).

  2. you can even do without the _val= assignment, because that's just what default attribute propagation is.

    In order to enable default attribute propagation on a rule that semantic action(s), use operator%= to define it:

    r %= qi::double_ [ qi::_pass = (qi::_1 < 256.0) ];

    Live On Coliru

    That prints the same output.

How can I extend a boost spirit grammar

I'd not complicate matters by inheriting. Composition is often more than enough, and it won't confuse the qi parser interface.

I've drawn up a small sketch of how a versioning grammar could be done. Assume the old grammar:

template <typename It, typename Skipper>
struct OldGrammar : qi::grammar<It, Skipper, std::string()>
{
OldGrammar() : OldGrammar::base_type(mainrule)
{
using namespace qi;
rule1 = int_(1); // expect version 1
rule2 = *char_; // hopefully some interesting grammar
mainrule = omit [ "version" > rule1 ] >> rule2;
}
private:
qi::rule<It, Skipper, std::string()> mainrule;
qi::rule<It, Skipper, int()> rule1;
qi::rule<It, Skipper, std::string()> rule2;
};

As you can see, this was quite restrictive, requiring the version to be exactly 1. However, the future happened, and a new version of the grammar was invented. Now, I'd add

friend struct NewGrammar<It, Skipper>;

to the old grammar and go about implementing the new grammar, which graciously falls back to the old grammar if so required:

template <typename It, typename Skipper>
struct NewGrammar : qi::grammar<It, Skipper, std::string()>
{
NewGrammar() : NewGrammar::base_type(mainrule)
{
using namespace qi;
new_rule1 = int_(2); // support version 2 now
new_start = omit [ "version" >> new_rule1 ] >> old.rule2; // note, no expectation point

mainrule = new_start
| old.mainrule; // or fall back to version 1 grammar
}
private:
OldGrammar<It, Skipper> old;
qi::rule<It, Skipper, std::string()> new_start, mainrule;
qi::rule<It, Skipper, int()> new_rule1;
};

(I haven't tried to make it work with inheritance, though in all likelihood it should also work.)

Let's test this baby:

template <template <typename It,typename Skipper> class Grammar>
bool test(std::string const& input)
{
auto f(input.begin()), l(input.end());
static const Grammar<std::string::const_iterator, qi::space_type> p;
try {
return qi::phrase_parse(f,l,p,qi::space) && (f == l); // require full input consumed
}
catch(...) { return false; } // qi::expectation_failure<>
}

int main()
{
assert(true == test<OldGrammar>("version 1 woot"));
assert(false == test<OldGrammar>("version 2 nope"));

assert(true == test<NewGrammar>("version 1 woot"));
assert(true == test<NewGrammar>("version 2 woot as well"));
}

All tests pass, obviously: see it live on Coliru1 Hope this helps!


1 Well, darn. Coliru is too slow to compile this today. So here is the full test program:

#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

template <typename It, typename Skipper>
struct NewGrammar; // forward declare for friend declaration

template <typename It, typename Skipper>
struct OldGrammar : qi::grammar<It, Skipper, std::string()>
{
friend struct NewGrammar<It, Skipper>; // NOTE

OldGrammar() : OldGrammar::base_type(mainrule)
{
using namespace qi;
rule1 = int_(1); // expect version 1
rule2 = *char_; // hopefully some interesting grammar
mainrule = omit [ "version" > rule1 ] >> rule2;

BOOST_SPIRIT_DEBUG_NODE(mainrule);
BOOST_SPIRIT_DEBUG_NODE(rule1);
BOOST_SPIRIT_DEBUG_NODE(rule2);
}
private:
qi::rule<It, Skipper, std::string()> mainrule;
qi::rule<It, Skipper, int()> rule1;
qi::rule<It, Skipper, std::string()> rule2;
};

template <typename It, typename Skipper>
struct NewGrammar : qi::grammar<It, Skipper, std::string()>
{
NewGrammar() : NewGrammar::base_type(mainrule)
{
using namespace qi;
new_rule1 = int_(2); // support version 2 now
new_start = omit [ "version" >> new_rule1 ] >> old.rule2; // note, no expectation point

mainrule = new_start
| old.mainrule; // or fall back to version 1 grammar

BOOST_SPIRIT_DEBUG_NODE(new_start);
BOOST_SPIRIT_DEBUG_NODE(mainrule);
BOOST_SPIRIT_DEBUG_NODE(new_rule1);
}
private:
OldGrammar<It, Skipper> old;
qi::rule<It, Skipper, std::string()> new_start, mainrule;
qi::rule<It, Skipper, int()> new_rule1;
};

template <template <typename It,typename Skipper> class Grammar>
bool test(std::string const& input)
{
auto f(input.begin()), l(input.end());
static const Grammar<std::string::const_iterator, qi::space_type> p;
try {
return qi::phrase_parse(f,l,p,qi::space) && (f == l); // require full input consumed
}
catch(...) { return false; } // qi::expectation_failure<>
}

int main()
{
assert(true == test<OldGrammar>("version 1 woot"));
assert(false == test<OldGrammar>("version 2 nope"));

assert(true == test<NewGrammar>("version 1 woot"));
assert(true == test<NewGrammar>("version 2 woot as well"));
}

how do i find the location where a Spirit parser matched?

You can use qi::raw[] to get the source iterator pair spanning a match.

There's a convenient helper iter_pos in Qi Repository that you can use to directly get a source iterator without using qi::raw[].

Also, with some semantic action trickery you can get both:

raw[ identifier [ do_something_with_attribute(_1) ] ]
[do_something_with_iterators(_1)]

In fact,

raw[ identifier [ _val = _1 ] ] [do_something_with_iterators(_1)]

would be close to "natural behaviour".

Extra Mile

To get file name/line/column values you can either do some iterator arithmetics or use the line_pos_iterator adaptor:

#include <boost/spirit/include/support_line_pos_iterator.hpp>

This has some accessor functions that help with line number/column tracking. You can probably find a few answers of mine on here with examples.

How to write a boost::spirit::qi parser to parse an integer range from 0 to std::numeric_limitsint::max()?

Simplest demo, attaching a semantic action to do the range check:

uint_ [ _pass = (_1>=0 && _1<=std::numeric_limits<int>::max()) ];

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

template <typename It>
struct MyInt : boost::spirit::qi::grammar<It, int()> {
MyInt() : MyInt::base_type(start) {
using namespace boost::spirit::qi;
start %= uint_ [ _pass = (_1>=0 && _1<=std::numeric_limits<int>::max()) ];
}
private:
boost::spirit::qi::rule<It, int()> start;
};

template <typename Int>
void test(Int value, char const* logical) {
MyInt<std::string::const_iterator> p;

std::string const input = std::to_string(value);
std::cout << " ---------------- Testing '" << input << "' (" << logical << ")\n";

auto f = input.begin(), l = input.end();
int parsed;
if (parse(f, l, p, parsed)) {
std::cout << "Parse success: " << parsed << "\n";
} else {
std::cout << "Parse failed\n";
}

if (f!=l) {
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}

int main() {
unsigned maxint = std::numeric_limits<int>::max();

MyInt<std::string::const_iterator> p;

test(maxint , "maxint");
test(maxint-1, "maxint-1");
test(maxint+1, "maxint+1");
test(0 , "0");
test(-1 , "-1");
}

Prints

 ---------------- Testing '2147483647' (maxint)
Parse success: 2147483647
---------------- Testing '2147483646' (maxint-1)
Parse success: 2147483646
---------------- Testing '2147483648' (maxint+1)
Parse failed
Remaining unparsed: '2147483648'
---------------- Testing '0' (0)
Parse success: 0
---------------- Testing '-1' (-1)
Parse failed
Remaining unparsed: '-1'

Questions about Spirit.Qi sequence operator and semantic actions

First, blow-by-blow. See below for a out-of-the-box answer.

Question 1: Why do I have to add a semantic action to the rule sign above?
Isn't char convertible to std::string?

Erm, no char is not convertible to string. See below for other options.

Question 2: Why does compilation fail when I try to merge the last two rules
like this:

rule<Iterator, std::string()> floating = -sign >> 
(mantissa >> -(exp | suffix) | +digit >> (exp | suffix));

This is due to the rules for atomic attribute assignment. The parser exposes something like

vector2<optional<string>, variant<
vector2<string, optional<string> >,
vector2<std::vector<char>, optional<string> > >

or similar (see the documentation for the parsers, I typed this in the browser from memory). This is, obviously, not assignable to string. Use qi::as<> to coerce atomic assignment. For convenience ***there is qi::as_string:

floating = qi::as_string [ -sign >> (mantissa >> -(exp | suffix) | 
+digit >> (exp | suffix)) ]

Question 3: Let's say I want to let the attribute of floating be double and
write a semantic action to do the conversion from string to double. How can I
refer to the entire string matched by the rule from inside the semantic
action?

You could use qi::as_string again, but the most appropriate would seem to be to use qi::raw:

floating = qi::raw [ -sign >> (mantissa >> -(exp | suffix) | 
+digit >> (exp | suffix)) ]
[ _val = parse_float(_1, _2) ];

This parser directive exposes a pair of source iterators, so you can use it to refer to the exact input sequence matched.

Question 4: In the rule floating of Question 2, what does the placeholder _2
refer to and what is its type?

In general, to detect attribute types - that is, when the documentation has you confused or you want to double check your understanding of it - see the answers here:

  • Detecting the parameter types in a Spirit semantic action

Out-of-the-box

Have you looked at using Qi's builtin real_parser<> template, which can be comprehensively customized. It sure looks like you'd want to use that instead of doing custom parsing in your semantic action.

The real_parser template with policies is both fast and very flexible and robust. See also the recent answer Is it possible to read infinity or NaN values using input streams?.

For models of RealPolicies the following expressions must be valid:

Expression                 | Semantics 
===========================+=============================================================================
RP::allow_leading_dot | Allow leading dot.
RP::allow_trailing_dot | Allow trailing dot.
RP::expect_dot | Require a dot.
RP::parse_sign(f, l) | Parse the prefix sign (e.g. '-'). Return true if successful, otherwise false.
RP::parse_n(f, l, n) | Parse the integer at the left of the decimal point. Return true if successful, otherwise false. If successful, place the result into n.
RP::parse_dot(f, l) | Parse the decimal point. Return true if successful, otherwise false.
RP::parse_frac_n(f, l, n) | Parse the fraction after the decimal point. Return true if successful, otherwise false. If successful, place the result into n.
RP::parse_exp(f, l) | Parse the exponent prefix (e.g. 'e'). Return true if successful, otherwise false.
RP::parse_exp_n(f, l, n) | Parse the actual exponent. Return true if successful, otherwise false. If successful, place the result into n.
RP::parse_nan(f, l, n) | Parse a NaN. Return true if successful, otherwise false. If successful, place the result into n.
RP::parse_inf(f, l, n) | Parse an Inf. Return true if successful, otherwise false. If successful, place the result into n

See the example for a compelling idea of how you'd use it.



Related Topics



Leave a reply



Submit