Parse Int or Double Using Boost Spirit (Longest_D)

Parse int or double using boost spirit (longest_d)

Firstly, do switch to Spirit V2 - which has superseded classical spirit for years now.

Second, you need to make sure an int gets preferred. By default, a double can parse any integer equally well, so you need to use strict_real_policies instead:

real_parser<double, strict_real_policies<double>> strict_double;

Now you can simply state

number = strict_double | int_;

See

  • realpolicies documentation

See test program Live on Coliru

#include <boost/spirit/include/qi.hpp>

using namespace boost::spirit::qi;

using A = boost::variant<int, double>;
static real_parser<double, strict_real_policies<double>> const strict_double;

A parse(std::string const& s)
{
typedef std::string::const_iterator It;
It f(begin(s)), l(end(s));
static rule<It, A()> const p = strict_double | int_;

A a;
assert(parse(f,l,p,a));

return a;
}

int main()
{
assert(0 == parse("42").which());
assert(0 == parse("-42").which());
assert(0 == parse("+42").which());

assert(1 == parse("42.").which());
assert(1 == parse("0.").which());
assert(1 == parse(".0").which());
assert(1 == parse("0.0").which());
assert(1 == parse("1e1").which());
assert(1 == parse("1e+1").which());
assert(1 == parse("1e-1").which());
assert(1 == parse("-1e1").which());
assert(1 == parse("-1e+1").which());
assert(1 == parse("-1e-1").which());
}

boost::spirit (qi) decision between float and double

Lexing can help. Ultimately you decide, not the parser. Ordering your branches should help. See also

  • Parse int or double using boost spirit (longest_d)

  • No match with qi::repeat and optional parser

For similar parsers with Boost Spirit.

If you want to decide between float/double, there is no real input criterion. I'd suggest always parsing into double. However, you could, of course use semantic actions to force a float for certain size.

Here's what C++ grammar does (e.g.):

floatrule  = lexeme [ float_ >> 'f' ];
doublerule = double_;

float_or_double = floatrule | doublerule;

Using boost::spirit to parse multiple types of single value

This is a fun exercise.

Of course, everything depends on the input grammar, which you conveniently fail to specify.

However, let's for the sake of demonstration assume a literals grammar (very) loosely based on C++ literals, we could come up with the following to parse decimal (signed) integral values, floating point values, bool literals and simplistic string literals:

typedef boost::variant<
double, unsigned int,
long, unsigned long, int,
bool, std::string> attr_t;

// ...

start =
(
// number formats with mandatory suffixes first
ulong_rule | uint_rule | long_rule |
// then those (optionally) without suffix
double_rule | int_rule |
// and the simple, unambiguous cases
bool_rule | string_rule
);

double_rule =
(&int_ >> (double_ >> 'f')) // if it could be an int, the suffix is required
| (!int_ >> double_ >> -lit('f')) // otherwise, optional
;
int_rule = int_;
uint_rule = uint_ >> 'u' ;
long_rule = long_ >> 'l' ;
ulong_rule = ulong_ >> "ul" ;
bool_rule = bool_;
string_rule = '"' >> *~char_('"') >> '"';

See the linked live demonstration for the output of the test cases: http://liveworkspace.org/code/goPNP

Note Only one test input ("invalid") is supposed to fail. The rest should parse into a literal, optionally leaving unparsed remaining input.

Full Demonstration With Tests

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;

typedef boost::variant<double, unsigned int, long, unsigned long, int, bool, std::string> attr_t;

template <typename It, typename Skipper = qi::space_type>
struct parser : qi::grammar<It, attr_t(), Skipper>
{
parser() : parser::base_type(start)
{
using namespace qi;

start =
(
// number formats with mandatory suffixes first
ulong_rule | uint_rule | long_rule |
// then those (optionally) without suffix
double_rule | int_rule |
// and the simple, unambiguous cases
bool_rule | string_rule
);

double_rule =
(&int_ >> (double_ >> 'f')) // if it could be an int, the suffix is required
| (!int_ >> double_ >> -lit('f')) // otherwise, optional
;
int_rule = int_;
uint_rule = uint_ >> 'u' ;
long_rule = long_ >> 'l' ;
ulong_rule = ulong_ >> "ul" ;
bool_rule = bool_;
string_rule = '"' >> *~char_('"') >> '"';

BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(double_rule);
BOOST_SPIRIT_DEBUG_NODE(ulong_rule);
BOOST_SPIRIT_DEBUG_NODE(long_rule);
BOOST_SPIRIT_DEBUG_NODE(uint_rule);
BOOST_SPIRIT_DEBUG_NODE(int_rule);
BOOST_SPIRIT_DEBUG_NODE(bool_rule);
BOOST_SPIRIT_DEBUG_NODE(string_rule);
}

private:
qi::rule<It, attr_t(), Skipper> start;
// no skippers in here (important):
qi::rule<It, double()> double_rule;
qi::rule<It, int()> int_rule;
qi::rule<It, unsigned int()> uint_rule;
qi::rule<It, long()> long_rule;
qi::rule<It, unsigned long()> ulong_rule;
qi::rule<It, bool()> bool_rule;
qi::rule<It, std::string()> string_rule;
};

struct effective_type : boost::static_visitor<std::string> {
template <typename T>
std::string operator()(T const& v) const {
return typeid(v).name();
}
};

bool testcase(const std::string& input)
{
typedef std::string::const_iterator It;
auto f(begin(input)), l(end(input));

parser<It, qi::space_type> p;
attr_t data;

try
{
std::cout << "parsing '" << input << "': ";
bool ok = qi::phrase_parse(f,l,p,qi::space,data);
if (ok)
{
std::cout << "success\n";
std::cout << "parsed data: " << karma::format_delimited(karma::auto_, ' ', data) << "\n";
std::cout << "effective typeid: " << boost::apply_visitor(effective_type(), data) << "\n";
}
else std::cout << "failed at '" << std::string(f,l) << "'\n";

if (f!=l) std::cout << "trailing unparsed: '" << std::string(f,l) << "'\n";
std::cout << "------\n\n";
return ok;
} catch(const qi::expectation_failure<It>& e)
{
std::string frag(e.first, e.last);
std::cout << e.what() << "'" << frag << "'\n";
}

return false;
}

int main()
{
for (auto const& s : std::vector<std::string> {
"1.3f",
"0.f",
"0.",
"0f",
"0", // int will be preferred
"1u",
"1ul",
"1l",
"1",
"false",
"true",
"\"hello world\"",
// interesting cases
"invalid",
"4.5e+7f",
"-inf",
"-nan",
"42 is the answer", // 'is the answer' is simply left unparsed, it's up to the surrounding grammar/caller
" 0\n ", // whitespace is fine
"42\n.0", // but not considered as part of a literal
})
{
testcase(s);
}
}

Parse arbitrary precision numbers with Boost spirit

I think that what is a staple of parser generators is indeed parsing into arbitrary types of integers.

What you are after is more: you want to parse into a type that represents arbitrary types of integers with added semantic information, based on decisions in your grammar.

These decisions cannot be baked into the parser generator, because that would tie it to a particular type of grammars.

Of course, you can do that, too. Let me walk through step by step.

1. The Staple

As you have noted, Spirit does that. Let's demonstrate the basics.

Loosely after http://www.nongnu.org/hcb/#integer-literal

_suffix += "u", "l", "ll", "ul", "lu", "ull", "llu";

_start = qi::no_case[ // case insensitive
("0x" >> qi::uint_parser<Integer, 16>{} |
"0b" >> qi::uint_parser<Integer, 2>{} |
&qi::lit('0') >> qi::uint_parser<Integer, 8>{} |
qi::uint_parser<Integer, 10>{})
// ignored for now:
>> -_suffix];

As you can see it parses hex, binary, octal and decimal unsigned numbers with an optional suffix. We're ignoring the suffix for now, so that we can demonstrate that it parses into generalized integral types.

See a demo Live On Compiler Explorer

template <typename Integer> void test() {
std::cout << " ---- " << __PRETTY_FUNCTION__ << "\n";
using It = std::string::const_iterator;
IntLiteral<It, Integer> const parser {};

for (std::string const input : {
"1234",
"1234u",
"0x12f34ULL",
"033ULL",
"0b101011l",
"33lu"
}) {
Integer value;
if (parse(input.begin(), input.end(), parser >> qi::eoi, value)) {
std::cout << "Parsed " << std::quoted(input) << " -> " << value << "\n";
} else {
std::cout << "Failed to parse " << std::quoted(input) << "\n";
}
}
}

int main() {
test<std::uintmax_t>();
test<boost::multiprecision::checked_int1024_t>();
}

Prints

 ---- void test() [with Integer = long unsigned int]
Parsed "1234" -> 1234
Parsed "1234u" -> 1234
Parsed "0x12f34ULL" -> 77620
Parsed "033ULL" -> 27
Parsed "0b101011l" -> 43
Parsed "33lu" -> 33
---- void test() [with Integer = boost::multiprecision::number<boost::multiprecision::backend
s::cpp_int_backend<1024, 1024, boost::multiprecision::signed_magnitude, boost::multiprecision:
:checked, void> >]
Parsed "1234" -> 1234
Parsed "1234u" -> 1234
Parsed "0x12f34ULL" -> 77620
Parsed "033ULL" -> 27
Parsed "0b101011l" -> 43
Parsed "33lu" -> 33

2. Variant Type

Now, you actually want the result to reflect the literal's type.

You can do that without LLVM. E.g. by parsing into intmax_t first, and then coercing to the appropriate type based on the suffix.

Let's parse into

using CxxInteger = boost::variant<
signed, unsigned,
signed long, unsigned long,
signed long long, unsigned long long>;

Then parsing with:

using Raw = std::uintmax_t;

_start = no_case [ // case insensitive
("0x" >> uint_parser<Raw, 16>{} |
"0b" >> uint_parser<Raw, 2>{} |
&lit('0') >> uint_parser<Raw, 8>{} |
uint_parser<Raw, 10>{})
// ignored for now:
>> _optsuffix
] [ _val = coerce_type(_1, _2) ];

_optsuffix = no_case[_suffix] | attr(Suffix::signed_);

Now, we have to write the semantic rules that apply to our grammar:

struct converter_f {
CxxInteger operator()(uintmax_t raw, Suffix sfx) const {
switch (sfx) {
case Suffix::signed_: return static_cast<signed>(raw);
case Suffix::unsigned_: return static_cast<unsigned>(raw);
case Suffix::long_: return static_cast<long>(raw);
case Suffix::longlong_: return static_cast<long long>(raw);
case Suffix::ul_: return static_cast<unsigned long>(raw);
case Suffix::ull_: return static_cast<unsigned long long>(raw);
}
throw std::invalid_argument("sfx");
}
};
boost::phoenix::function<converter_f> coerce_type;

That's it. We can now parse the same test cases Live On Compiler Explorer

std::cout << "Parsed " << std::quoted(input) << " -> " << value
<< " (type #" << value.which() << " "
<< boost::core::demangle(value.type().name()) << ")\n";

Prints

 ---- void test()
Parsed "1234" -> 1234 (type #0 int)
Parsed "1234u" -> 1234 (type #1 unsigned int)
Parsed "0x12f34ULL" -> 77620 (type #5 unsigned long long)
Parsed "033ULL" -> 27 (type #5 unsigned long long)
Parsed "0b101011l" -> 43 (type #2 long)
Parsed "33lu" -> 33 (type #3 unsigned long)

3. Applying To LLVM APInt

The mechanics are the same:

struct converter_f {
template <typename T> static auto as(uint64_t raw) {
return llvm::APInt(raw, CHAR_BIT * sizeof(T), std::is_signed_v<T>);
}
llvm::APInt operator()(uintmax_t raw, Suffix sfx) const {
switch (sfx) {
case Suffix::signed_: return as<signed>(raw);
case Suffix::unsigned_: return as<unsigned>(raw);
case Suffix::long_: return as<long>(raw);
case Suffix::longlong_: return as<long long>(raw);
case Suffix::ul_: return as<unsigned long>(raw);
case Suffix::ull_: return as<unsigned long long>(raw);
}
throw std::invalid_argument("sfx");
}
};

Full Demo

"Live" On Compiler Explorer

(compiler explorer doesn't support linking to LLVM)

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iomanip>
#include <llvm/ADT/APInt.h>
namespace qi = boost::spirit::qi;

template <typename It>
struct IntLiteral : qi::grammar<It, llvm::APInt()> {
IntLiteral() : IntLiteral::base_type(_start) {
using namespace qi;
using Raw = std::uint64_t;

_start = no_case [ // case insensitive
("0x" >> uint_parser<Raw, 16>{} |
"0b" >> uint_parser<Raw, 2>{} |
&lit('0') >> uint_parser<Raw, 8>{} |
uint_parser<Raw, 10>{})
// ignored for now:
>> _optsuffix
] [ _val = coerce_type(_1, _2) ];

_optsuffix = no_case[_suffix] | attr(Suffix::signed_);
}

private:
enum class Suffix {
signed_ = 0,
unsigned_ = 1,
long_ = 2,
longlong_ = 4,

l_ = long_,
ll_ = longlong_,
ul_ = unsigned_ | l_,
ull_ = unsigned_ | ll_,
};

struct suffix_sym : qi::symbols<char, Suffix> {
suffix_sym() {
this->add
("u", Suffix::unsigned_)
("l", Suffix::l_)
("ll", Suffix::ll_)
("ul", Suffix::ul_) ("lu", Suffix::ul_)
("ull", Suffix::ull_) ("llu", Suffix::ull_)
;
}
} _suffix;

struct converter_f {
template <typename T> static auto as(uint64_t raw) {
return llvm::APInt(CHAR_BIT * sizeof(T), raw, std::is_signed_v<T>);
}
llvm::APInt operator()(uint64_t raw, Suffix sfx) const {
switch (sfx) {
case Suffix::signed_: return as<signed>(raw);
case Suffix::unsigned_: return as<unsigned>(raw);
case Suffix::long_: return as<long>(raw);
case Suffix::longlong_: return as<long long>(raw);
case Suffix::ul_: return as<unsigned long>(raw);
case Suffix::ull_: return as<unsigned long long>(raw);
}
throw std::invalid_argument("sfx");
}
};
boost::phoenix::function<converter_f> coerce_type;

qi::rule<It, llvm::APInt()> _start;
qi::rule<It, Suffix()> _optsuffix;
};

void test() {
std::cout << " ---- " << __PRETTY_FUNCTION__ << "\n";
using It = std::string::const_iterator;
IntLiteral<It> const parser {};

for (std::string const input : {
"1234",
"1234u",
"0x12f34ULL",
"033ULL",
"0b101011l",
"33lu"
}) {
llvm::APInt value;
if (parse(input.begin(), input.end(), parser >> qi::eoi, value)) {
std::cout << "Parsed " << std::quoted(input) << " -> "
<< value.toString(10, false) // TODO signed?
<< " bits:" << value.getBitWidth() << "\n";
} else {
std::cout << "Failed to parse " << std::quoted(input) << "\n";
}
}
}

int main() {
test();
}

Prints

 ---- void test()
Parsed "1234" -> 1234 bits:32
Parsed "1234u" -> 1234 bits:32
Parsed "0x12f34ULL" -> 77620 bits:64
Parsed "033ULL" -> 27 bits:64
Parsed "0b101011l" -> 43 bits:64
Parsed "33lu" -> 33 bits:64

Remaining Loose Ends

  • Of course, with semantic actions you can in fact parse the string representation using the fromString factory method

  • I don't know how to accurate ask APInt whether it is signed. I suspect I should have been parsing into a variant<APInt, APSInt> to retain that information

  • I didn't put work into detecting overflows. The first example should have that out-of-the-box (thanks to Qi)

  • I also didn't put work into supporting c++14 digit separators because it wasn't specified. And it didn't seem part of any "staple" feature anyways.

boost spirit x3 int32 | double_ fails to parse double

Your problem is very similar to this question.

When the integer parser occurs first in your grammar, it is preferred. For the input "12.9" the parser will parse the integer part of "12.9 which is 12 and will stop at the .. live example

You have to reverse the order so the double parser is preferred over the integer one:

const auto double_or_int =  boost::spirit::x3::double_ | boost::spirit::x3::int32;

This will now work for "12.9": live example

However, since a double parser also parses an integer, you will always get a double, even if the input is "12": live example

In order to prevent this, you need a strict double parser:

boost::spirit::x3::real_parser<double, boost::spirit::x3::strict_real_policies<double> > const double_ = {};

live example

boost::spirit::qi::double_ and boost::spirit::qi::int_

In addition to the pragmatic approach1 given by interjay, have a look at real_parser_policies:

real_parser<double,strict_real_policies<double>>() | int_

would be equally good.


1 which I sometimes use myself (you should be able to find an answer doing this on SO). Note, however that there are problems when the input is e.g. 123e-5 (which would parse an int, leaving e-5 unparsed).

What should qi::uint_parserint() parse exactly?

The problem is solved now on boost 1.68.0. qi::uint_parser<int>() parses integers from 0 to std::numeric_limits<int>::max(). spirit x3 is also fixed.

https://github.com/boostorg/spirit/pull/297

Parse into a vectorvectordouble with boost::spirit

The answer is yes.
It is actually quite trivial to parse into vector<vector<double> >

The rule definition requires a function type, not the type directly. This is simply explained here. A more thorough explanation is probably found in the documentation of boost::phoenix

The output of the program above is now showing nicely the parsed values:

parse success.
0, 5011, 10000, 15000, 20000, 25000,
-40, 0, 20, 40,
Base:
200, 175, 170, 165, 160, 150,
200, 175, 170, 165, 160, 150,
165, 165, 160, 155, 145, 145,
160, 155, 150, 145, 145, 140,

Broken std::cout output when using combined immediate = string|float|int rule using qi::double_ an qi::uint_

Use qi::raw on integer and double floating point parsers so that the numbers are converted lexically: qi::raw[qi::uint_] and qi::raw[qi::double_].

But also the order of parsing is important. If uint_ parser is before double_ like here:

immediate = double_quoted_string | qi::raw[qi::uint_] | qi::raw[qi::double_];
BOOST_SPIRIT_DEBUG_NODES((immediate)); // for debug output

then the uint_ parser will partially consume the double floating point number and then the whole parsing will fail:

<immediate>
<try>34.35</try>
<success>.35</success> //<----- this is what is left after uint_ parsed
<attributes>[[3, 4]]</attributes> // <---- what uint_ parser successfully parsed
</immediate>
"34.35" Failed
Remaining unparsed: "34.35"

After swapping order of uint_ with double_:

immediate = double_quoted_string | qi::raw[qi::double_] | qi::raw[qi::uint_];

The result:

"\"hello\"" OK: 'hello'
----
" \" hello \" " OK: ' hello '
----
" \" hello \"\"stranger\"\" \" " OK: ' hello "stranger" '
----
"1" OK: '1'
----
"64" OK: '64'
----
"456" OK: '456'
----
"3.3" OK: '3.3'
----
"34.35" OK: '34.35'
----


Related Topics



Leave a reply



Submit