Undefined Behaviour Somewhere in Boost::Spirit::Qi::Phrase_Parse

undefined behaviour somewhere in boost::spirit::qi::phrase_parse

You cannot use auto to store parser expressions¹

Either you need to evaluate from the temporary expression directly, or you need to assign to a rule/grammar:

const qi::rule<std::string::const_iterator, qi::space_type> doubles_parser_local = qi::double_ >> *(',' >> qi::double_);

You can have your cake and eat it too on most recent BOost versions (possibly the dev branch) there should be a BOOST_SPIRIT_AUTO macro

This is becoming a bit of a FAQ item:

  • Assigning parsers to auto variables
  • boost spirit V2 qi bug associated with optimization level

¹ I believe this is actually a limitation of the underlying Proto library. There's a Proto-0x lib version on github (by Eric Niebler) that promises to solve these issues by being completely redesigned to be aware of references. I think this required some c++11 features that Boost Proto currently cannot use.

Boost Spirit Qi crashes for memory violation

The problem is the use of auto expressions to capture rules, which deduces the types from the parser expressions. That type is a proto-expression tree, which captures any relations by reference, but that means many of _the intermediates are gone after the end of the enclosing full-expresion (see C++: Life span of temporary arguments?).

This is pretty well-known, as you can see here:

  • Assigning parsers to auto variables
  • boost spirit V2 qi bug associated with optimization level
  • undefined behaviour somewhere in boost::spirit::qi::phrase_parse
  • And some more

Here's the simplest fix:

auto versionParser = bsq::copy(
bsq::uint_
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_));

If you also fix the missing intialization of the local variables it works correctly:

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>

namespace bsq = boost::spirit::qi;

int main()
{
std::cout << "BOOST_VERSION: " << BOOST_VERSION << std::endl;

std::uint16_t major = 0, minor = 0, build = 0, revision = 0;

auto versionParser = bsq::copy(
bsq::uint_
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_));

std::string version = "3.5.1";

auto start = version.begin();
if (!bsq::parse(start, version.end(), versionParser, major, minor, build, revision))
{
std::cout << "Error!\n";
}

std::cout << major << "-" << minor << "-" << build << "-" << revision << std::endl;
}

Prints

BOOST_VERSION: 106600
3-5-1-0

Additional Notes

  1. To avoid the whole "unitialized attribute" situation, let's make it so the parser assigns to all elements, even if unspecified in the input text:

        >> ('.' >> bsq::uint_ | bsq::attr(0))
    >> ('.' >> bsq::uint_ | bsq::attr(0))
    >> ('.' >> bsq::uint_ | bsq::attr(0))
  2. To diagnose errors where there is trailing "garbage" (like with "3.4bogus"), you could add a check that the full input is parsed:

    auto versionParser = bsq::copy(
    bsq::uint_
    >> ('.' >> bsq::uint_ | bsq::attr(0))
    >> ('.' >> bsq::uint_ | bsq::attr(0))
    >> ('.' >> bsq::uint_ | bsq::attr(0))
    >> bsq::eoi);
  3. Because a version is semantically a tuple, why not represent it as such?

    using Version = std::tuple<uint16_t, uint16_t, uint16_t, uint16_t>;
    Version parsed;

    if (!bsq::parse(version.begin(), version.end(), versionParser, parsed))
    std::cout << "Error!\n";

    That way you can even say:

    using boost::fusion::operator<<;

    auto obsolete = parsed < Version(3, 4, 0, 0);
    std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";

Combining those:

Live On Coliru

#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/include/qi.hpp>

namespace bsq = boost::spirit::qi;

int main() {
auto versionParser = bsq::copy(
bsq::uint_
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> bsq::eoi);

std::string version = "3.5.1";

using Version = std::tuple<uint16_t, uint16_t, uint16_t, uint16_t>;
Version parsed;

if (!bsq::parse(version.begin(), version.end(), versionParser, parsed))
std::cout << "Error!\n";

using boost::fusion::operator<<;

auto obsolete = parsed < Version(3, 4, 0, 0);
std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";
}

Prints

Version (3 5 1 0) supported

std::tuple sucks?

I agree. So, equivalently write your own struct:

Live On Coliru

struct Version {
uint16_t major, minor, revision, build;

auto key() const { return std::tie(major, minor, revision, build); }
bool operator<(Version const& b) const { return key() < b.key(); }
};

BOOST_FUSION_ADAPT_STRUCT(Version, major, minor, revision, build)

Gettin' With The Times

Note that Spirit X3 (Getting into boost spirit; Qi or X3?) doesn't have the auto-issues that you ran into:

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>

#include <boost/fusion/include/io.hpp>
#include <iostream>

namespace bsx = boost::spirit::x3;

struct Version {
uint16_t major, minor, revision, build;

auto key() const { return std::tie(major, minor, revision, build); }
bool operator<(Version const& b) const { return key() < b.key(); }
};

BOOST_FUSION_ADAPT_STRUCT(Version, major, minor, revision, build)

int main() {
auto versionParser = bsx::uint_
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> bsx::eoi;

std::string version = "3.5.1";

Version parsed;

if (!parse(version.begin(), version.end(), versionParser, parsed))
std::cout << "Error!\n";

using boost::fusion::operator<<;

auto obsolete = parsed < Version{3, 4, 0, 0};
std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";
}

Also printing the same.

inconsistent behavior of boost spirit grammar

You're invoking Undefined Behaviour in your code.

Specifically where you use auto to store a parser expression. The Expression Template contains references to temporaries that become dangling at the end of the containing full-expression¹.

UB implies that anything can happen. Both compilers are right! And the best part is, you will probably see varying behaviour depending on the compiler flags used.

Fix it either by using:

  • qi::copy (or boost::proto::deep_copy before v.1.55 IIRC)
  • use BOOST_SPIRIT_AUTO instead of BOOST_AUTO (mostly helpful iff you also support C++03)
  • use qi::rule<> and qi::grammar<> (the non-terminals) to type-erase and the expressions. This has performance impact too, but also gives more features like

    • recursive rules
    • locals and inherited attributes
    • declared skippers (handy because rules can be implicitly lexeme[] (see here)
    • better code organization.

Note also that Spirit X3 promises to drop there restrictions on the use with auto. It's basically a whole lot more lightweight due to the use of c++14 features. Keep in mind that it's not stable yet.

  • Showing that GCC with -O2 shows undefined results; Live On Coliru

  • The fixed version:

Live On Coliru

//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops

#include <iostream>

int main() {
typedef unsigned long long ull;

std::string const curline = "1;2;;3,4;5";
std::cout << "parsing: '" << curline << "'\n";

namespace qi = boost::spirit::qi;

#if 0 // THIS IS UNDEFINED BEHAVIOUR:
auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
auto grammar = ids % ';';
#else // THIS IS CORRECT:
auto ids = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
auto grammar = qi::copy(ids % ';');
#endif

std::vector<std::vector<ull> > r;
qi::parse(curline.begin(), curline.end(), grammar, r);

std::cout << "got: ";
for (auto v: r){
for (auto i : v)
std::cout << i << ",";
std::cout << ";";
}
std::cout <<"\n";
}

Printing (also with GCC -O2!):

parsing: '1;2;;3,4;5'
got: 1,;2,;;3,4,;5,;

¹ (that's basically "at the next semicolon" here; but in standardese)

Boost.Spirit.Qi alternative ( | ) parser issue

By appending >> eoi you explicitly make the parse failed if it didn't reach the end of the input.

Live On Coliru

#include <string>
#include <iostream>
#include <iomanip>

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

template <typename Expr>
void test(std::string name, Expr const& expr) {
std::string const test = "D-z!D-z@mib-A3A026FF.rev.sfr.net";

auto f = begin(test);
bool ok = qi::parse(f, end(test), expr);
std::cout << name << ": " << ok << "\n";
if (f != end(test))
std::cout << " -- remaining input: '" << std::string(f, end(test)) << "'\n";
}

int main() {
auto const hexdigit = qi::char_("0123456789ABCDEF");
auto const special = qi::char_("\x5b-\x60\x7b-\x7d");

auto const oneToThreeDigits = qi::repeat(1, 3)[qi::digit];
auto const ip4addr = oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits;
auto const ip6addr = +(hexdigit >> qi::repeat(7)[':' >> +hexdigit]) | ("0:0:0:0:0:" >> (qi::lit('0') | "FFFF") >> ':' >> ip4addr);
auto const hostaddr = ip4addr | ip6addr;

auto const nickname = (qi::alpha | special) >> qi::repeat(0, 8)[qi::alnum | special | '-'];
auto const user = +(~qi::char_("\x0d\x0a\x20\x40"));

auto const shortname = qi::alnum >> *(qi::alnum | '-');
auto const hostname = shortname >> *('.' >> shortname);
auto const host = hostname | hostaddr;

auto const nickUserHost = nickname >> -(-('!' >> user) >> '@' >> host);

auto const prefix = hostname | nickUserHost; // The problematic alternative

std::cout << std::boolalpha;
test("hostname", hostname);
test("nickUserHost", nickUserHost);
test("prefix", prefix);
}

Prints

hostname: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'
nickUserHost: true
prefix: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'

Boost Spirit eliminate left recursion from simple addition operator

There are a number of things about your example.

First off, you use auto with Spirit Qi expressions. That's not valid, and leads to UB:

  • Assigning parsers to auto variables
  • also boost spirit qi parser failed in release and pass in debug
  • undefined behaviour somewhere in boost::spirit::qi::phrase_parse
  • boost spirit V2 qi bug associated with optimization level

Next off, you chose to use polymorphic Ast nodes. That's possible but likely not efficient:

  • How can I use polymorphic attributes with boost::spirit::qi parsers?
  • Semantic actions runs multiple times in boost::spirit parsing
  • but also making a vector of shared pointers from Spirit Qi which you may have found

Finally, there's the left recursion, since your expression starts with itself, leading to infinite recursion. The only way to solve it is to split your productions up into "levels" of expressions. This also aids in generating the desired operator precedence:

 expression = term >> char_("-+") >> term;
term = factor >> char_("*/%") >> factor;
factor = simple >> char_("^") >> simple;

In your case, I'd suggest:

    simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;

expression
= (simple >> '+' >> expression)
[_val = make_shared_<AddNode>()(_1, _2)]
| simple
;

Of course, you can be simpler and slightly less inefficient here:

    expression 
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _0)])
;

Full Demo

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace { // https://stackoverflow.com/a/21565350/85371

template <typename T>
struct make_shared_f
{
template <typename... A> struct result
{ typedef std::shared_ptr<T> type; };

template <typename... A>
typename result<A...>::type operator()(A&&... a) const {
return std::make_shared<T>(std::forward<A>(a)...);
}
};
}

template <typename T>
using make_shared_ = boost::phoenix::function<make_shared_f<T> >;

struct AstNode {
virtual ~AstNode() = default;
};
using AstNodePtr = std::shared_ptr<AstNode>;
struct DoubleNode : AstNode {
DoubleNode(double) {}
};
struct AddNode : AstNode {
AddNode(AstNodePtr, AstNodePtr) {}
};

#include <iomanip>

namespace qi = boost::spirit::qi;

template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr()> {

simple_grammar() : simple_grammar::base_type(start) {
using namespace qi::labels;

simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;

expression
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _1)])
;

start = qi::skip(qi::space) [ expression ];
BOOST_SPIRIT_DEBUG_NODES((start)(expression)(simple))
}
private:
qi::rule<Iterator, AstNodePtr()> start;
qi::rule<Iterator, AstNodePtr(), qi::space_type> expression;
// implicit lexemes
qi::rule<Iterator, AstNodePtr()> simple;
};

int main() {
simple_grammar<std::string::const_iterator> g;

for (std::string const input : {
"1 + 2",
"3.14"
})
{
auto f = begin(input), l = end(input);
AstNodePtr ast;
if (qi::parse(f, l, g, ast)) {
std::cout << "Succeeded: " << boost::core::demangle(typeid(*ast).name()) << "\n";
} else {
std::cout << "Failed\n";
}

if (f!=l)
std::cout << "Remaining unparsed " << std::quoted(std::string(f,l)) << "\n";
}
}

Prints

Succeeded: AddNode
Succeeded: DoubleNode

Parsing nested lists with Boost.Spirit

You're using auto for proto expression templates in the Qi domain - that's undefined behaviour 99.9% of the time:

  • undefined behaviour somewhere in boost::spirit::qi::phrase_parse

Now, while you fix that, also make the list body optional:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
using It = std::string::const_iterator;
using Skipper = qi::space_type;

for(std::string const input : { "[[\t1 , 2 ], [3, 4, 65.4]]", "[[\t1 , 2 ], [3, 4, 65.4], []]", "[]" })
{
std::cout << " ---- '" << input << "' ----\n";
auto it = input.begin();

//parse the string
using doubles = std::vector<double>;
using vectors = std::vector<doubles>;

qi::rule<It, doubles(), Skipper> doubles_ = "[" >> -(qi::double_ % ",") >> "]";
qi::rule<It, vectors(), Skipper> vectors_ = "[" >> -(doubles_ % ",") >> "]";

vectors data;
bool match = qi::phrase_parse(it, input.end(), vectors_, qi::space, data);

//check if we have a match
std::cout << "matched: " << std::boolalpha << match << '\n';
if (it != input.end())
std::cout << "unmatched part: " << std::string{it, input.end()} << '\n';

//print the result
std::cout << "result\n";
for (const auto& v : data) {
std::cout << "[";
for (double i : v)
std::cout << i << ",";
std::cout << "]\n";
}
}
}

Prints

 ---- '[[   1 , 2 ], [3, 4, 65.4]]' ----
matched: true
result
[1,2,]
[3,4,65.4,]
---- '[[ 1 , 2 ], [3, 4, 65.4], []]' ----
matched: true
result
[1,2,]
[3,4,65.4,]
[]
---- '[]' ----
matched: true
result

Using boost::spirit to parse named parameters in any order

Just for fun/completeness I reviewed the grammar and came up with the following test.

I have made a few improvement suggestions left and right (as the OP witnessed on the live stream), and the resulting code, test and output are here:

Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#include <iostream>

namespace blocks {
struct CalcBlock {
std::string calculationTitle;
float matchingRad;
float stepSize;
std::string problemType;
int maxPartialWaveJ;
float sMatrixConvergenceValue;
float partialWaveConvergenceValue;
float smallValueLimit;
std::string potentialRadType;
};
}

BOOST_FUSION_ADAPT_STRUCT(blocks::CalcBlock, // Boost 1.58+ style adapt-struct
calculationTitle, matchingRad, stepSize, problemType, maxPartialWaveJ,
sMatrixConvergenceValue, partialWaveConvergenceValue, smallValueLimit,
potentialRadType)

namespace blocks {

namespace qi = boost::spirit::qi;

template <typename Iterator>
struct CalcBlockParser : qi::grammar<Iterator, CalcBlock()> {

CalcBlockParser() : CalcBlockParser::base_type(start) {

using namespace qi;
auto eol_ = copy((',' >> *eol) | +eol); // http://stackoverflow.com/a/26411266/85371 (!)

quotedString = '"' >> +~char_("\"\n") >> '"';
plainString = +~char_(" ,\n");

start = skip(blank) [cbRule];

cbRule = lexeme["[CalculationBlock]"] >> eol
>> (
(lexeme["CalculationTitle"] >> '=' >> quotedString >> eol_)
^ (lexeme["MatchingRadius"] >> '=' >> float_ >> eol_)
^ (lexeme["StepSize"] >> '=' >> float_ >> eol_)
^ (lexeme["ProblemType"] >> '=' >> plainString >> eol_)
^ (lexeme["MaxPartialWaveJ"] >> '=' >> int_ >> eol_)
^ (lexeme["SMatConv"] >> '=' >> float_ >> eol_)
^ (lexeme["PartialWaveConv"] >> '=' >> float_ >> eol_)
^ (lexeme["SmallValueLimit"] >> '=' >> float_ >> eol_)
^ (lexeme["PotentialRadType"] >> '=' >> plainString >> eol_)
)
>> lexeme["[end]"]
>> *eol
>> eoi;
}

private:
qi::rule<Iterator, CalcBlock()> start;
qi::rule<Iterator, CalcBlock(), qi::blank_type> cbRule;
// lexemes:
qi::rule<Iterator, std::string()> quotedString, plainString;
};
}

using boost::fusion::as_vector;
typedef boost::spirit::istream_iterator It;

int main(int argc, char **argv) {
if (argc != 2) {
std::cout << "Usage:\n\t" << argv[0] << " InputFileName" << std::endl;
return 1;
}

std::string inputFileName(argv[1]);
std::cout << "Reading input from the file: " << inputFileName << std::endl;
std::ifstream input(inputFileName);
input.unsetf(std::ios::skipws);

It start(input), stop;

blocks::CalcBlock cb;
blocks::CalcBlockParser<It> cbParser;

bool success = parse(start, stop, cbParser, cb);

{
using namespace boost::fusion;
std::cout << tuple_open('[') << tuple_close(']') << tuple_delimiter(", ");
}

std::cout << "-------------------------\n";
std::cout << "Parsing " << (success?"succeeded":"failed") << "\n";
std::cout << "got: " << as_vector(cb) << "\n";
std::cout << "-------------------------\n";
}

Input:

[CalculationBlock]
CalculationTitle="Test Parser Input System"

SMatConv=10E-8,

PartialWaveConv= 10E-8, MaxPartialWaveJ=800, SmallValueLimit = 10E-8

PotentialRadType=HeavyIon , MatchingRadius=25.0, StepSize=0.01,ProblemType=RelSchroedingerEqn

[end]

Output:

Reading input from the file: input.txt
-------------------------
Parsing succeeded
got: [Test Parser Input System, 25, 0.01, RelSchroedingerEqn, 800, 1e-07, 1e-07, 1e-07, HeavyIon]
-------------------------

boost spirit V2 qi bug associated with optimization level

It's a bug in your code, nothing wrong with the compiler or the optimization levels.

The cinch is with expression templates (like the ones used by Boost Proto, and hence by Boost Spirit). They are only valid to the end of their enclosing full expression [1]

The canonical workaound is:

 BOOST_SPIRIT_AUTO(ana, *~qi::char_('*') > +qi::char_('*'));

You can find it here: http://svn.boost.org/svn/boost/trunk/libs/spirit/example/qi/typeof.cpp and it got first introduced in the comments at this blog post: http://boost-spirit.com/home/articles/qi-example/zero-to-60-mph-in-2-seconds/

The explicit fix I tested (which worked nicely on my box, no warnings remaining in valgrind):

auto const ana = boost::proto::deep_copy(
*~qi::char_('*') > +qi::char_('*'))
;

Spirit X3 promises to remove this wart. Slightly related, I think Protox11 also removes this issue by being aware of references at all times.


[1] Grep the standard for lifetime extension of temporaries. The expression templates keep references to the literals used (the rest has value semantics anyways), but the temporaries aren't bound to (const) references. So they go out of scope. Undefined Behaviour results



Related Topics



Leave a reply



Submit