undefined behaviour somewhere in boost::spirit::qi::phrase_parse
You cannot use auto
to store parser expressions¹
Either you need to evaluate from the temporary expression directly, or you need to assign to a rule/grammar:
const qi::rule<std::string::const_iterator, qi::space_type> doubles_parser_local = qi::double_ >> *(',' >> qi::double_);
You can have your cake and eat it too on most recent BOost versions (possibly the dev branch) there should be a BOOST_SPIRIT_AUTO macro
This is becoming a bit of a FAQ item:
- Assigning parsers to auto variables
- boost spirit V2 qi bug associated with optimization level
¹ I believe this is actually a limitation of the underlying Proto library. There's a Proto-0x lib version on github (by Eric Niebler) that promises to solve these issues by being completely redesigned to be aware of references. I think this required some c++11 features that Boost Proto currently cannot use.
Boost Spirit Qi crashes for memory violation
The problem is the use of auto
expressions to capture rules, which deduces the types from the parser expressions. That type is a proto-expression tree, which captures any relations by reference, but that means many of _the intermediates are gone after the end of the enclosing full-expresion (see C++: Life span of temporary arguments?).
This is pretty well-known, as you can see here:
- Assigning parsers to auto variables
- boost spirit V2 qi bug associated with optimization level
- undefined behaviour somewhere in boost::spirit::qi::phrase_parse
- And some more
Here's the simplest fix:
auto versionParser = bsq::copy(
bsq::uint_
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_));
If you also fix the missing intialization of the local variables it works correctly:
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace bsq = boost::spirit::qi;
int main()
{
std::cout << "BOOST_VERSION: " << BOOST_VERSION << std::endl;
std::uint16_t major = 0, minor = 0, build = 0, revision = 0;
auto versionParser = bsq::copy(
bsq::uint_
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_)
>> -('.' >> bsq::uint_));
std::string version = "3.5.1";
auto start = version.begin();
if (!bsq::parse(start, version.end(), versionParser, major, minor, build, revision))
{
std::cout << "Error!\n";
}
std::cout << major << "-" << minor << "-" << build << "-" << revision << std::endl;
}
Prints
BOOST_VERSION: 106600
3-5-1-0
Additional Notes
To avoid the whole "unitialized attribute" situation, let's make it so the parser assigns to all elements, even if unspecified in the input text:
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))To diagnose errors where there is trailing "garbage" (like with
"3.4bogus"
), you could add a check that the full input is parsed:auto versionParser = bsq::copy(
bsq::uint_
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> bsq::eoi);Because a version is semantically a tuple, why not represent it as such?
using Version = std::tuple<uint16_t, uint16_t, uint16_t, uint16_t>;
Version parsed;
if (!bsq::parse(version.begin(), version.end(), versionParser, parsed))
std::cout << "Error!\n";That way you can even say:
using boost::fusion::operator<<;
auto obsolete = parsed < Version(3, 4, 0, 0);
std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";
Combining those:
Live On Coliru
#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/include/qi.hpp>
namespace bsq = boost::spirit::qi;
int main() {
auto versionParser = bsq::copy(
bsq::uint_
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> ('.' >> bsq::uint_ | bsq::attr(0))
>> bsq::eoi);
std::string version = "3.5.1";
using Version = std::tuple<uint16_t, uint16_t, uint16_t, uint16_t>;
Version parsed;
if (!bsq::parse(version.begin(), version.end(), versionParser, parsed))
std::cout << "Error!\n";
using boost::fusion::operator<<;
auto obsolete = parsed < Version(3, 4, 0, 0);
std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";
}
Prints
Version (3 5 1 0) supported
std::tuple
sucks?
I agree. So, equivalently write your own struct:
Live On Coliru
struct Version {
uint16_t major, minor, revision, build;
auto key() const { return std::tie(major, minor, revision, build); }
bool operator<(Version const& b) const { return key() < b.key(); }
};
BOOST_FUSION_ADAPT_STRUCT(Version, major, minor, revision, build)
Gettin' With The Times
Note that Spirit X3 (Getting into boost spirit; Qi or X3?) doesn't have the auto
-issues that you ran into:
Live On Coliru
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/io.hpp>
#include <iostream>
namespace bsx = boost::spirit::x3;
struct Version {
uint16_t major, minor, revision, build;
auto key() const { return std::tie(major, minor, revision, build); }
bool operator<(Version const& b) const { return key() < b.key(); }
};
BOOST_FUSION_ADAPT_STRUCT(Version, major, minor, revision, build)
int main() {
auto versionParser = bsx::uint_
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> ('.' >> bsx::uint_ | bsx::attr(0))
>> bsx::eoi;
std::string version = "3.5.1";
Version parsed;
if (!parse(version.begin(), version.end(), versionParser, parsed))
std::cout << "Error!\n";
using boost::fusion::operator<<;
auto obsolete = parsed < Version{3, 4, 0, 0};
std::cout << "Version " << parsed << " " << (obsolete? "too old" : "supported") << "\n";
}
Also printing the same.
inconsistent behavior of boost spirit grammar
You're invoking Undefined Behaviour in your code.
Specifically where you use auto
to store a parser expression. The Expression Template contains references to temporaries that become dangling at the end of the containing full-expression¹.
UB implies that anything can happen. Both compilers are right! And the best part is, you will probably see varying behaviour depending on the compiler flags used.
Fix it either by using:
qi::copy
(orboost::proto::deep_copy
before v.1.55 IIRC)- use
BOOST_SPIRIT_AUTO
instead ofBOOST_AUTO
(mostly helpful iff you also support C++03) use
qi::rule<>
andqi::grammar<>
(the non-terminals) to type-erase and the expressions. This has performance impact too, but also gives more features like- recursive rules
- locals and inherited attributes
- declared skippers (handy because rules can be implicitly
lexeme[]
(see here) - better code organization.
Note also that Spirit X3 promises to drop there restrictions on the use with auto. It's basically a whole lot more lightweight due to the use of c++14 features. Keep in mind that it's not stable yet.
Showing that GCC with -O2 shows undefined results; Live On Coliru
The fixed version:
Live On Coliru
//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops
#include <iostream>
int main() {
typedef unsigned long long ull;
std::string const curline = "1;2;;3,4;5";
std::cout << "parsing: '" << curline << "'\n";
namespace qi = boost::spirit::qi;
#if 0 // THIS IS UNDEFINED BEHAVIOUR:
auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
auto grammar = ids % ';';
#else // THIS IS CORRECT:
auto ids = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
auto grammar = qi::copy(ids % ';');
#endif
std::vector<std::vector<ull> > r;
qi::parse(curline.begin(), curline.end(), grammar, r);
std::cout << "got: ";
for (auto v: r){
for (auto i : v)
std::cout << i << ",";
std::cout << ";";
}
std::cout <<"\n";
}
Printing (also with GCC -O2!):
parsing: '1;2;;3,4;5'
got: 1,;2,;;3,4,;5,;
¹ (that's basically "at the next semicolon" here; but in standardese)
Boost.Spirit.Qi alternative ( | ) parser issue
By appending >> eoi
you explicitly make the parse failed if it didn't reach the end of the input.
Live On Coliru
#include <string>
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
template <typename Expr>
void test(std::string name, Expr const& expr) {
std::string const test = "D-z!D-z@mib-A3A026FF.rev.sfr.net";
auto f = begin(test);
bool ok = qi::parse(f, end(test), expr);
std::cout << name << ": " << ok << "\n";
if (f != end(test))
std::cout << " -- remaining input: '" << std::string(f, end(test)) << "'\n";
}
int main() {
auto const hexdigit = qi::char_("0123456789ABCDEF");
auto const special = qi::char_("\x5b-\x60\x7b-\x7d");
auto const oneToThreeDigits = qi::repeat(1, 3)[qi::digit];
auto const ip4addr = oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits;
auto const ip6addr = +(hexdigit >> qi::repeat(7)[':' >> +hexdigit]) | ("0:0:0:0:0:" >> (qi::lit('0') | "FFFF") >> ':' >> ip4addr);
auto const hostaddr = ip4addr | ip6addr;
auto const nickname = (qi::alpha | special) >> qi::repeat(0, 8)[qi::alnum | special | '-'];
auto const user = +(~qi::char_("\x0d\x0a\x20\x40"));
auto const shortname = qi::alnum >> *(qi::alnum | '-');
auto const hostname = shortname >> *('.' >> shortname);
auto const host = hostname | hostaddr;
auto const nickUserHost = nickname >> -(-('!' >> user) >> '@' >> host);
auto const prefix = hostname | nickUserHost; // The problematic alternative
std::cout << std::boolalpha;
test("hostname", hostname);
test("nickUserHost", nickUserHost);
test("prefix", prefix);
}
Prints
hostname: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'
nickUserHost: true
prefix: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'
Boost Spirit eliminate left recursion from simple addition operator
There are a number of things about your example.
First off, you use auto
with Spirit Qi expressions. That's not valid, and leads to UB:
- Assigning parsers to auto variables
- also boost spirit qi parser failed in release and pass in debug
- undefined behaviour somewhere in boost::spirit::qi::phrase_parse
- boost spirit V2 qi bug associated with optimization level
Next off, you chose to use polymorphic Ast nodes. That's possible but likely not efficient:
- How can I use polymorphic attributes with boost::spirit::qi parsers?
- Semantic actions runs multiple times in boost::spirit parsing
- but also making a vector of shared pointers from Spirit Qi which you may have found
Finally, there's the left recursion, since your expression starts with itself, leading to infinite recursion. The only way to solve it is to split your productions up into "levels" of expressions. This also aids in generating the desired operator precedence:
expression = term >> char_("-+") >> term;
term = factor >> char_("*/%") >> factor;
factor = simple >> char_("^") >> simple;
In your case, I'd suggest:
simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;
expression
= (simple >> '+' >> expression)
[_val = make_shared_<AddNode>()(_1, _2)]
| simple
;
Of course, you can be simpler and slightly less inefficient here:
expression
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _0)])
;
Full Demo
Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace { // https://stackoverflow.com/a/21565350/85371
template <typename T>
struct make_shared_f
{
template <typename... A> struct result
{ typedef std::shared_ptr<T> type; };
template <typename... A>
typename result<A...>::type operator()(A&&... a) const {
return std::make_shared<T>(std::forward<A>(a)...);
}
};
}
template <typename T>
using make_shared_ = boost::phoenix::function<make_shared_f<T> >;
struct AstNode {
virtual ~AstNode() = default;
};
using AstNodePtr = std::shared_ptr<AstNode>;
struct DoubleNode : AstNode {
DoubleNode(double) {}
};
struct AddNode : AstNode {
AddNode(AstNodePtr, AstNodePtr) {}
};
#include <iomanip>
namespace qi = boost::spirit::qi;
template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr()> {
simple_grammar() : simple_grammar::base_type(start) {
using namespace qi::labels;
simple
= qi::double_
[_val = make_shared_<DoubleNode>()(_1)];
;
expression
= simple [_val = _1]
>> *(('+' >> expression)
[_val = make_shared_<AddNode>()(_val, _1)])
;
start = qi::skip(qi::space) [ expression ];
BOOST_SPIRIT_DEBUG_NODES((start)(expression)(simple))
}
private:
qi::rule<Iterator, AstNodePtr()> start;
qi::rule<Iterator, AstNodePtr(), qi::space_type> expression;
// implicit lexemes
qi::rule<Iterator, AstNodePtr()> simple;
};
int main() {
simple_grammar<std::string::const_iterator> g;
for (std::string const input : {
"1 + 2",
"3.14"
})
{
auto f = begin(input), l = end(input);
AstNodePtr ast;
if (qi::parse(f, l, g, ast)) {
std::cout << "Succeeded: " << boost::core::demangle(typeid(*ast).name()) << "\n";
} else {
std::cout << "Failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed " << std::quoted(std::string(f,l)) << "\n";
}
}
Prints
Succeeded: AddNode
Succeeded: DoubleNode
Parsing nested lists with Boost.Spirit
You're using auto
for proto expression templates in the Qi domain - that's undefined behaviour 99.9% of the time:
- undefined behaviour somewhere in boost::spirit::qi::phrase_parse
Now, while you fix that, also make the list body optional:
Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
using It = std::string::const_iterator;
using Skipper = qi::space_type;
for(std::string const input : { "[[\t1 , 2 ], [3, 4, 65.4]]", "[[\t1 , 2 ], [3, 4, 65.4], []]", "[]" })
{
std::cout << " ---- '" << input << "' ----\n";
auto it = input.begin();
//parse the string
using doubles = std::vector<double>;
using vectors = std::vector<doubles>;
qi::rule<It, doubles(), Skipper> doubles_ = "[" >> -(qi::double_ % ",") >> "]";
qi::rule<It, vectors(), Skipper> vectors_ = "[" >> -(doubles_ % ",") >> "]";
vectors data;
bool match = qi::phrase_parse(it, input.end(), vectors_, qi::space, data);
//check if we have a match
std::cout << "matched: " << std::boolalpha << match << '\n';
if (it != input.end())
std::cout << "unmatched part: " << std::string{it, input.end()} << '\n';
//print the result
std::cout << "result\n";
for (const auto& v : data) {
std::cout << "[";
for (double i : v)
std::cout << i << ",";
std::cout << "]\n";
}
}
}
Prints
---- '[[ 1 , 2 ], [3, 4, 65.4]]' ----
matched: true
result
[1,2,]
[3,4,65.4,]
---- '[[ 1 , 2 ], [3, 4, 65.4], []]' ----
matched: true
result
[1,2,]
[3,4,65.4,]
[]
---- '[]' ----
matched: true
result
Using boost::spirit to parse named parameters in any order
Just for fun/completeness I reviewed the grammar and came up with the following test.
I have made a few improvement suggestions left and right (as the OP witnessed on the live stream), and the resulting code, test and output are here:
Live On Coliru
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>
#include <iostream>
namespace blocks {
struct CalcBlock {
std::string calculationTitle;
float matchingRad;
float stepSize;
std::string problemType;
int maxPartialWaveJ;
float sMatrixConvergenceValue;
float partialWaveConvergenceValue;
float smallValueLimit;
std::string potentialRadType;
};
}
BOOST_FUSION_ADAPT_STRUCT(blocks::CalcBlock, // Boost 1.58+ style adapt-struct
calculationTitle, matchingRad, stepSize, problemType, maxPartialWaveJ,
sMatrixConvergenceValue, partialWaveConvergenceValue, smallValueLimit,
potentialRadType)
namespace blocks {
namespace qi = boost::spirit::qi;
template <typename Iterator>
struct CalcBlockParser : qi::grammar<Iterator, CalcBlock()> {
CalcBlockParser() : CalcBlockParser::base_type(start) {
using namespace qi;
auto eol_ = copy((',' >> *eol) | +eol); // http://stackoverflow.com/a/26411266/85371 (!)
quotedString = '"' >> +~char_("\"\n") >> '"';
plainString = +~char_(" ,\n");
start = skip(blank) [cbRule];
cbRule = lexeme["[CalculationBlock]"] >> eol
>> (
(lexeme["CalculationTitle"] >> '=' >> quotedString >> eol_)
^ (lexeme["MatchingRadius"] >> '=' >> float_ >> eol_)
^ (lexeme["StepSize"] >> '=' >> float_ >> eol_)
^ (lexeme["ProblemType"] >> '=' >> plainString >> eol_)
^ (lexeme["MaxPartialWaveJ"] >> '=' >> int_ >> eol_)
^ (lexeme["SMatConv"] >> '=' >> float_ >> eol_)
^ (lexeme["PartialWaveConv"] >> '=' >> float_ >> eol_)
^ (lexeme["SmallValueLimit"] >> '=' >> float_ >> eol_)
^ (lexeme["PotentialRadType"] >> '=' >> plainString >> eol_)
)
>> lexeme["[end]"]
>> *eol
>> eoi;
}
private:
qi::rule<Iterator, CalcBlock()> start;
qi::rule<Iterator, CalcBlock(), qi::blank_type> cbRule;
// lexemes:
qi::rule<Iterator, std::string()> quotedString, plainString;
};
}
using boost::fusion::as_vector;
typedef boost::spirit::istream_iterator It;
int main(int argc, char **argv) {
if (argc != 2) {
std::cout << "Usage:\n\t" << argv[0] << " InputFileName" << std::endl;
return 1;
}
std::string inputFileName(argv[1]);
std::cout << "Reading input from the file: " << inputFileName << std::endl;
std::ifstream input(inputFileName);
input.unsetf(std::ios::skipws);
It start(input), stop;
blocks::CalcBlock cb;
blocks::CalcBlockParser<It> cbParser;
bool success = parse(start, stop, cbParser, cb);
{
using namespace boost::fusion;
std::cout << tuple_open('[') << tuple_close(']') << tuple_delimiter(", ");
}
std::cout << "-------------------------\n";
std::cout << "Parsing " << (success?"succeeded":"failed") << "\n";
std::cout << "got: " << as_vector(cb) << "\n";
std::cout << "-------------------------\n";
}
Input:
[CalculationBlock]
CalculationTitle="Test Parser Input System"
SMatConv=10E-8,
PartialWaveConv= 10E-8, MaxPartialWaveJ=800, SmallValueLimit = 10E-8
PotentialRadType=HeavyIon , MatchingRadius=25.0, StepSize=0.01,ProblemType=RelSchroedingerEqn
[end]
Output:
Reading input from the file: input.txt
-------------------------
Parsing succeeded
got: [Test Parser Input System, 25, 0.01, RelSchroedingerEqn, 800, 1e-07, 1e-07, 1e-07, HeavyIon]
-------------------------
boost spirit V2 qi bug associated with optimization level
It's a bug in your code, nothing wrong with the compiler or the optimization levels.
The cinch is with expression templates (like the ones used by Boost Proto, and hence by Boost Spirit). They are only valid to the end of their enclosing full expression [1]
The canonical workaound is:
BOOST_SPIRIT_AUTO(ana, *~qi::char_('*') > +qi::char_('*'));
You can find it here: http://svn.boost.org/svn/boost/trunk/libs/spirit/example/qi/typeof.cpp and it got first introduced in the comments at this blog post: http://boost-spirit.com/home/articles/qi-example/zero-to-60-mph-in-2-seconds/
The explicit fix I tested (which worked nicely on my box, no warnings remaining in valgrind):
auto const ana = boost::proto::deep_copy(
*~qi::char_('*') > +qi::char_('*'))
;
Spirit X3 promises to remove this wart. Slightly related, I think Protox11 also removes this issue by being aware of references at all times.
[1] Grep the standard for lifetime extension of temporaries. The expression templates keep references to the literals used (the rest has value semantics anyways), but the temporaries aren't bound to (const) references. So they go out of scope. Undefined Behaviour results
Related Topics
How Does _Builtin_Clear_Cache Work
Using 3Rd Party Header Files with Rcpp
How to Find All the Functions Exposed by a Dll
How to Generate Assembly Code with Clang in Intel Syntax
Is Rvo (Return Value Optimization) Applicable for All Objects
Do C++ Templates Make Programs Slow
How to Automatically Register a Class on Creation
Should We Generally Use Float Literals for Floats Instead of the Simpler Double Literals
How to Call the Original "Operator New" If I Have Overloaded It
How to Get a Windows Symbol Server Set Up
C++ Error: '_Mm_Sin_Ps' Was Not Declared in This Scope
Return Value of Operator Overloading in C++
Calling the Base Class Constructor from the Derived Class Constructor
Are There Any Tricks to Use Std::Cin to Initialize a Const Variable