Troubles with Boost::Spirit::Lex & Whitespace

Troubles with boost::spirit::lex & whitespace

You have created a second lexer state, but never invoked it.

Simplify and profit:

For most cases, the easiest way to have the desired effect would be to use single-state lexing with a pass_ignore flag on the skippable tokens:

    this->self += identifier
                | white_space [ lex::_pass = lex::pass_flags::pass_ignore ];

Note that this requires an actor_lexer to allow for the semantic action:

typedef lex::lexertl::actor_lexer<token_type> lexer_type;

Full sample:

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
namespace lex = boost::spirit::lex;

template <typename Lexer>
struct lexer_identifier : lex::lexer<Lexer>
{
    lexer_identifier()
        : identifier("[a-zA-Z_][a-zA-Z0-9_]*")
        , white_space("[ \\t\\n]+")
    {
        using boost::spirit::lex::_start;
        using boost::spirit::lex::_end;

        this->self += identifier
                    | white_space [ lex::_pass = lex::pass_flags::pass_ignore ];
    }
    lex::token_def<> identifier;
    lex::token_def<> white_space;
    std::string identifier_name;
};

int main(int argc, const char *argv[])
{
    typedef lex::lexertl::token<char const*,lex::omit, boost::mpl::false_> token_type;
    typedef lex::lexertl::actor_lexer<token_type> lexer_type;

    typedef lexer_identifier<lexer_type>::iterator_type iterator_type;

    lexer_identifier<lexer_type> my_lexer;

    std::string test("adedvied das934adf dfklj_03245");

    char const* first = test.c_str();
    char const* last = &first[test.size()];

    lexer_type::iterator_type iter = my_lexer.begin(first, last);
    lexer_type::iterator_type end = my_lexer.end();

    while (iter != end && token_is_valid(*iter))
    {
        ++iter;
    }

    bool r = (iter == end);
    std::cout << std::boolalpha << r << "\n";
}

Prints

true

"WS" as a Skipper state

It is also possible you came across a sample that uses the second parser state for the skipper (lex::tokenize_and_phrase_parse). Let me take a minute or 10 to create a working sample for that.

Update Took me a bit more than 10 minutes (waaaah) :) Here's a comparative test, showing how the lexer states interact, and how to use Spirit Skipper parsing to invoke the second parser state:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
namespace lex = boost::spirit::lex;
namespace qi  = boost::spirit::qi;

template <typename Lexer>
struct lexer_identifier : lex::lexer<Lexer>
{
    lexer_identifier()
        : identifier("[a-zA-Z_][a-zA-Z0-9_]*")
        , white_space("[ \\t\\n]+")
    {
        this->self       = identifier;
        this->self("WS") = white_space;
    }
    lex::token_def<> identifier;
    lex::token_def<lex::omit> white_space;
};

int main()
{
    typedef lex::lexertl::token<char const*, lex::omit, boost::mpl::true_> token_type;
    typedef lex::lexertl::lexer<token_type> lexer_type;

    typedef lexer_identifier<lexer_type>::iterator_type iterator_type;

    lexer_identifier<lexer_type> my_lexer;

    std::string test("adedvied das934adf dfklj_03245");

    {
        char const* first = test.c_str();
        char const* last = &first[test.size()];

        // cannot lex in just default WS state:
        bool ok = lex::tokenize(first, last, my_lexer, "WS");
        std::cout << "Starting state WS:\t" << std::boolalpha << ok << "\n";
    }

    {
        char const* first = test.c_str();
        char const* last = &first[test.size()];

        // cannot lex in just default state either:
        bool ok = lex::tokenize(first, last, my_lexer, "INITIAL");
        std::cout << "Starting state INITIAL:\t" << std::boolalpha << ok << "\n";
    }

    {
        char const* first = test.c_str();
        char const* last = &first[test.size()];

        bool ok = lex::tokenize_and_phrase_parse(first, last, my_lexer, *my_lexer.self, qi::in_state("WS")[my_lexer.self]);
        ok = ok && (first == last); // verify full input consumed
        std::cout << std::boolalpha << ok << "\n";
    }
}

The output is

Starting state WS:  false
Starting state INITIAL: false
true

Whitespace skipper when using Boost.Spirit Qi and Lex

For some strange reason only now I found a different question, Boost.Spirit SQL grammar/lexer failure, where some other solution to whitespace skipping is provided. A better one!

So below is the example code reworked along the suggestions there:

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <algorithm>
#include <iostream>
#include <string>
#include <utility>
#include <vector>

namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;

template<typename Lexer>
class expression_lexer
    : public lex::lexer<Lexer>
{
public:
    typedef lex::token_def<> operator_token_type;
    typedef lex::token_def<> value_token_type;
    typedef lex::token_def<> variable_token_type;
    typedef lex::token_def<lex::omit> parenthesis_token_type;
    typedef std::pair<parenthesis_token_type, parenthesis_token_type> parenthesis_token_pair_type;
    typedef lex::token_def<lex::omit> whitespace_token_type;

    expression_lexer()
        : operator_add('+'),
          operator_sub('-'),
          operator_mul("[x*]"),
          operator_div("[:/]"),
          value("\\d+(\\.\\d+)?"),
          variable("%(\\w+)"),
          parenthesis({
            std::make_pair(parenthesis_token_type('('), parenthesis_token_type(')')),
            std::make_pair(parenthesis_token_type('['), parenthesis_token_type(']'))
          }),
          whitespace("[ \\t]+")
    {
        this->self
            += operator_add
            | operator_sub
            | operator_mul
            | operator_div
            | value
            | variable
            | whitespace [lex::_pass = lex::pass_flags::pass_ignore]
            ;

        std::for_each(parenthesis.cbegin(), parenthesis.cend(),
            [&](parenthesis_token_pair_type const& token_pair)
            {
                this->self += token_pair.first | token_pair.second;
            }
        );
    }

    operator_token_type operator_add;
    operator_token_type operator_sub;
    operator_token_type operator_mul;
    operator_token_type operator_div;

    value_token_type value;
    variable_token_type variable;

    std::vector<parenthesis_token_pair_type> parenthesis;

    whitespace_token_type whitespace;
};

template<typename Iterator>
class expression_grammar
    : public qi::grammar<Iterator>
{
public:
    template<typename Tokens>
    explicit expression_grammar(Tokens const& tokens)
        : expression_grammar::base_type(start)
    {
        start                     %= expression >> qi::eoi;

        expression                %= sum_operand >> -(sum_operator >> expression);
        sum_operator              %= tokens.operator_add | tokens.operator_sub;
        sum_operand               %= fac_operand >> -(fac_operator >> sum_operand);
        fac_operator              %= tokens.operator_mul | tokens.operator_div;

        if(!tokens.parenthesis.empty())
            fac_operand           %= parenthesised | terminal;
        else
            fac_operand           %= terminal;

        terminal                  %= tokens.value | tokens.variable;

        if(!tokens.parenthesis.empty())
        {
            parenthesised         %= tokens.parenthesis.front().first >> expression >> tokens.parenthesis.front().second;
            std::for_each(tokens.parenthesis.cbegin() + 1, tokens.parenthesis.cend(),
                [&](typename Tokens::parenthesis_token_pair_type const& token_pair)
                {
                    parenthesised %= parenthesised.copy() | (token_pair.first >> expression >> token_pair.second);
                }
            );
        }
    }

private:
    qi::rule<Iterator> start;
    qi::rule<Iterator> expression;
    qi::rule<Iterator> sum_operand;
    qi::rule<Iterator> sum_operator;
    qi::rule<Iterator> fac_operand;
    qi::rule<Iterator> fac_operator;
    qi::rule<Iterator> terminal;
    qi::rule<Iterator> parenthesised;
};

int main()
{
    typedef lex::lexertl::token<std::string::const_iterator> token_type;
    typedef expression_lexer<lex::lexertl::actor_lexer<token_type>> expression_lexer_type;
    typedef expression_lexer_type::iterator_type expression_lexer_iterator_type;
    typedef expression_grammar<expression_lexer_iterator_type> expression_grammar_type;

    expression_lexer_type lexer;
    expression_grammar_type grammar(lexer);

    while(std::cin)
    {
        std::string line;
        std::getline(std::cin, line);

        std::string::const_iterator first = line.begin();
        std::string::const_iterator const last = line.end();

        bool const result = lex::tokenize_and_parse(first, last, lexer, grammar);
        if(!result)
            std::cout << "Parsing failed! Reminder: >" << std::string(first, last) << "<" << std::endl;
        else
        {
            if(first != last)
                std::cout << "Parsing succeeded! Reminder: >" << std::string(first, last) << "<" << std::endl;
            else
                std::cout << "Parsing succeeded!" << std::endl;
        }
    }
}

The differences are following:

whitespace token is added to lexer's self as all other tokens.
However, an action is associated with it. The action makes the lexer ignore the token. Which is exactly what we want.
My expression_grammar no longer takes Skipper template argument. And so it is also removed from rules.
lex::lexertl::actor_lexer is used instead of lex::lexertl::lexer since now there is an action associated with a token.
I'm calling tokenize_and_parse instead of tokenize_and_phrase_parse as I don't need to pass skipper anymore.
Also I changed first assignment to this->self in lexer from = to += as it seems more flexible (resistant to order changes). But it doesn't affect the solution here.

I'm good with this. It suites my needs (or better to say my taste) perfectly. However I wonder whether there are any other consequences of such change? Is any approach preferred in some situations? That I don't know.

Boost.Spirit SQL grammar/lexer failure

Regarding the whitespace skipping I can only conclude that pre-skipping is not being done initially (perhaps the state is not switched correctly).

^{Of course, you could try to remedy this using the lex::tokenize_and_parse API (passing the initial state as "WS"). I misrembered the API, you could only do this with manual tokenization, which precludes the state switching by Qi in the first place.}

However, what I tend to do is make skipping the responsibility of the lexer:

ws = "[ \\t\\n]+";
comment = "--[^\\n]*\\n";  // Single line comments with --
cstyle_comment = "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/"; // C-style comments

this->self += ws              [ lex::_pass = lex::pass_flags::pass_ignore ] 
            | comment         [ lex::_pass = lex::pass_flags::pass_ignore ]
            | cstyle_comment  [ lex::_pass = lex::pass_flags::pass_ignore ]
            ;

Now there is no need to use a skipper at all, and this succeeds in parsing the first problem (starting with a comment).

Full code: Live On Coliru

Look for #ifdef STATE_WS

//#define BOOST_SPIRIT_QI_DEBUG
//#define STATE_WS

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp> 

#include <boost/algorithm/string.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
#include <boost/lexical_cast.hpp>

#include <iostream>
#include <fstream>
#include <string>
#include <set>
#include <utility>

namespace bs  = boost::spirit;
namespace lex = boost::spirit::lex;
namespace qi  = boost::spirit::qi;
namespace phx = boost::phoenix;

// Token definition base, defines all tokens for the base grammar below
template <typename Lexer>
struct sql_tokens : lex::lexer<Lexer>
{
public:
    // Tokens with no attributes.
    lex::token_def<lex::omit> type_smallint;
    lex::token_def<lex::omit> type_int;
    lex::token_def<lex::omit> type_varchar;
    lex::token_def<lex::omit> type_text;
    lex::token_def<lex::omit> type_date;
    lex::token_def<lex::omit> kw_not_null;
    lex::token_def<lex::omit> kw_auto_increment;
    lex::token_def<lex::omit> kw_unique;
    lex::token_def<lex::omit> kw_default;
    lex::token_def<lex::omit> kw_create;
    lex::token_def<lex::omit> kw_table;
    lex::token_def<lex::omit> kw_constraint;
    lex::token_def<lex::omit> kw_primary_key;

    // Attributed tokens. (If you add a new type, don't forget to add it to the lex::lexertl::token definition too).
    lex::token_def<int>         signed_digit;
    lex::token_def<std::size_t> unsigned_digit;
    lex::token_def<std::string> identifier;
    lex::token_def<std::string> quoted_string;

    lex::token_def<lex::omit>   ws, comment, cstyle_comment;

    sql_tokens()
    {
        // Column data types.
        type_smallint     = "(?i:smallint)";
        type_int          = "(?i:int)";
        type_varchar      = "(?i:varchar)";
        type_text         = "(?i:text)";
        type_date         = "(?i:date)";

        // Keywords.
        kw_not_null       = "(?i:not +null)";
        kw_auto_increment = "(?i:auto_increment)";
        kw_unique         = "(?i:unique)";
        kw_default        = "(?i:default)";
        kw_create         = "(?i:create)";
        kw_table          = "(?i:table)";
        kw_constraint     = "(?i:constraint)";
        kw_primary_key    = "(?i:primary +key)";

        // Values.
        signed_digit      = "[+-]?[0-9]+";
        unsigned_digit    = "[0-9]+";
        quoted_string     = "\\\"(\\\\.|[^\\\"])*\\\""; // \"(\\.|[^\"])*\"

        // Identifier.
        identifier        = "[a-zA-Z][a-zA-Z0-9_]*";

        // The token must be added in priority order.
        this->self += lex::token_def<>('(') | ')' | ',' | ';';
        this->self += type_smallint | type_int | type_varchar | type_text |
                                    type_date;
        this->self += kw_not_null | kw_auto_increment | kw_unique | kw_default |
                                    kw_create | kw_table | kw_constraint | kw_primary_key;
        this->self += identifier | unsigned_digit | signed_digit | quoted_string;

#ifdef STATE_WS
        // define the whitespace to ignore.
        this->self("WS")
                =       ws
                |       comment
                |       cstyle_comment
                ;
#else
        ws = "[ \\t\\n]+";
        comment = "--[^\\n]*\\n";  // Single line comments with --
        cstyle_comment = "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/"; // C-style comments

        this->self += ws              [ lex::_pass = lex::pass_flags::pass_ignore ] 
                    | comment         [ lex::_pass = lex::pass_flags::pass_ignore ]
                    | cstyle_comment  [ lex::_pass = lex::pass_flags::pass_ignore ]
                    ;
#endif
    }
};

// Grammar definition, define a little part of the SQL language.
template <typename Iterator, typename Lexer>
struct sql_grammar 
#ifdef STATE_WS
    : qi::grammar<Iterator, qi::in_state_skipper<Lexer> >
#else
    : qi::grammar<Iterator>
#endif
{
    template <typename TokenDef>
    sql_grammar(TokenDef const& tok)
        : sql_grammar::base_type(program, "program")
    {
        program 
            =  (statement % ';') >> *qi::lit(';')
            ;

        statement 
            =   create_statement.alias()
            ;

        create_statement
            =   tok.kw_create >> create_table
            ;

        create_table
            =   tok.kw_table >> tok.identifier >> '(' >> create_table_columns >> -(',' >> table_constraints) >> ')'
            ;

        table_constraints
            =   constraint_definition % ','
            ;

        constraint_definition
            = tok.kw_constraint >> tok.identifier >> primary_key_constraint
            ;

        primary_key_constraint
            = tok.kw_primary_key >> '(' >> (tok.identifier % ',') >> ')'
            ;

        create_table_columns
            =   column_definition % ','
            ;

        column_definition
            =   tok.identifier >> column_type >> *type_constraint
            ;

        type_constraint
            =   tok.kw_not_null
            |   tok.kw_auto_increment
            |   tok.kw_unique
            |   default_value
            ;

        default_value
            =   tok.kw_default > tok.quoted_string
            ;

        column_type
            =   tok.type_smallint
            |   tok.type_int
            |   (tok.type_varchar > '(' > tok.unsigned_digit > ')') 
            |   tok.type_text
            |   tok.type_date
            ;

        program.name("program");
        statement.name("statement");
        create_statement.name("create statement");
        create_table.name("create table");
        create_table_columns.name("create table columns");
        column_definition.name("column definition");
        column_type.name("column type");
        default_value.name("default value");
        type_constraint.name("type constraint");
        table_constraints.name("table constraints");
        constraint_definition.name("constraint definition");
        primary_key_constraint.name("primary key constraint");

        BOOST_SPIRIT_DEBUG_NODE(program);
        BOOST_SPIRIT_DEBUG_NODE(statement);
        BOOST_SPIRIT_DEBUG_NODE(create_statement);
        BOOST_SPIRIT_DEBUG_NODE(create_table);
        BOOST_SPIRIT_DEBUG_NODE(create_table_columns);
        BOOST_SPIRIT_DEBUG_NODE(column_definition);
        BOOST_SPIRIT_DEBUG_NODE(column_type);
        BOOST_SPIRIT_DEBUG_NODE(default_value);
        BOOST_SPIRIT_DEBUG_NODE(type_constraint);
        BOOST_SPIRIT_DEBUG_NODE(table_constraints);
        BOOST_SPIRIT_DEBUG_NODE(constraint_definition);
        BOOST_SPIRIT_DEBUG_NODE(primary_key_constraint);

        using namespace qi::labels;
        qi::on_error<qi::fail>
        (
            program,
            std::cout
                << phx::val("Error! Expecting ")
                << bs::_4                               // what failed?
                << phx::val(" here: \"")
                << phx::construct<std::string>(bs::_3, bs::_2)   // iterators to error-pos, end
                << phx::val("\"")
                << std::endl
        );
    }

private:
#ifdef STATE_WS
    typedef qi::in_state_skipper<Lexer> skipper_type;
#else
    typedef qi::unused_type skipper_type;
#endif
    typedef qi::rule<Iterator, skipper_type> simple_rule;

    simple_rule program, statement, create_statement, create_table, table_constraints, constraint_definition;
    simple_rule primary_key_constraint, create_table_columns, column_definition, type_constraint, default_value, column_type;
};

std::string cin2string()
{
    std::istreambuf_iterator<char> f(std::cin), l;
    std::string result;
    std::copy(f, l, std::back_inserter(result));
    return result;
}

int main(int argc, char* argv[])
{
    // iterator type used to expose the underlying input stream
    typedef std::string::const_iterator base_iterator_type;

    // This is the lexer token type to use.
    typedef lex::lexertl::token<
        base_iterator_type, boost::mpl::vector<int, std::size_t, std::string> 
    > token_type;

    #ifdef STATE_WS
        typedef lex::lexertl::lexer<token_type> lexer_type;
    #else
        typedef lex::lexertl::actor_lexer<token_type> lexer_type;
    #endif

    // This is the token definition type (derived from the given lexer type).
    typedef sql_tokens<lexer_type> sql_tokens;

    // this is the iterator type exposed by the lexer 
    typedef sql_tokens::iterator_type iterator_type;

    // this is the type of the grammar to parse
    typedef sql_grammar<iterator_type, sql_tokens::lexer_def> sql_grammar;

    // now we use the types defined above to create the lexer and grammar
    // object instances needed to invoke the parsing process
    sql_tokens tokens;                         // Our lexer
    sql_grammar sql(tokens);                  // Our parser

    const std::string str = cin2string();

    // At this point we generate the iterator pair used to expose the
    // tokenized input stream.
    base_iterator_type it = str.begin();
    iterator_type iter = tokens.begin(it, str.end());
    iterator_type end = tokens.end();

    // Parsing is done based on the the token stream, not the character 
    // stream read from the input.
    // Note how we use the lexer defined above as the skip parser. It must
    // be explicitly wrapped inside a state directive, switching the lexer 
    // state for the duration of skipping whitespace.
#ifdef STATE_WS
    std::string ws("WS");
    bool r = qi::phrase_parse(iter, end, sql, qi::in_state(ws)[tokens.self]);
#else
    bool r = qi::parse(iter, end, sql);
#endif

    if (r && iter == end)
    {
        std::cout << "-------------------------\n";
        std::cout << "Parsing succeeded\n";
        std::cout << "-------------------------\n";
    }
    else
    {
        std::cout << "-------------------------\n";
        std::cout << "Parsing failed\n";
        std::cout << "-------------------------\n";
    }
    return 0;
}

How to combine boost::spirit::lex & boost::spirit::qi?

You parser isn't failing, but no it isn't 'silently' skipping the whitespace either (it parses only one non-whitespace token, anyway).

In fact, a property of *phrase_parse family of Spirit APIs is that it may not match the full input. In fact, this is why it takes the first iterator by reference: after parsing the iterator will indicate where parsing stopped.

I have changed a few bits around so you can easily access the source iterator, by using lex::tokenize_and_phrase_parse instead of qi::phrase_parse on lexer_tokens:

Iterator first = test.c_str();
Iterator last = &first[test.size()];

bool r = lex::tokenize_and_phrase_parse(first,last,my_lexer,my_grammar,qi::in_state( "WS" )[ my_lexer.self ]);

std::cout << std::boolalpha << r << "\n";
std::cout << "Remaining unparsed: '" << std::string(first,last) << "'\n";

The output is:

Remaining unparsed: '56'

Here is a full working example (note I also changed the second parameter of the grammar class to be the Skipper directly, which is more typical for Spirit grammars):

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/lex_lexertl.hpp>

namespace qi  = boost::spirit::qi;
namespace lex = boost::spirit::lex;

enum LexerIDs { ID_IDENTIFIER, ID_WHITESPACE, ID_INTEGER, ID_FLOAT, ID_PUNCTUATOR };

template <typename Lexer>
struct custom_lexer : lex::lexer<Lexer>
{
    custom_lexer()
        : identifier    ("[a-zA-Z_][a-zA-Z0-9_]*")
        , white_space   ("[ \\t\\n]+")
        , integer_value ("[1-9][0-9]*")
        , hex_value     ("0[xX][0-9a-fA-F]+")
        , float_value   ("[0-9]*\\.[0-9]+([eE][+-]?[0-9]+)?")
        , float_value2  ("[0-9]+\\.([eE][+-]?[0-9]+)?")
        , punctuator    ("\\[|\\]|\\(|\\)|\\.|&>|\\*\\*|\\*|\\+|-|~|!|\\/|%|<<|>>|<|>|<=|>=|==|!=|\\^|&|\\||\\^\\^|&&|\\|\\||\\?|:|,")// [ ] ( ) . &> ** * + - ~ ! / % << >> < > <= >= == != ^ & | ^^ && || ? : ,
    {
        using boost::spirit::lex::_start;
        using boost::spirit::lex::_end;

        this->self.add
            (identifier   , ID_IDENTIFIER)
          /*(white_space  , ID_WHITESPACE)*/
            (integer_value, ID_INTEGER)
            (hex_value    , ID_INTEGER)
            (float_value  , ID_FLOAT)
            (float_value2 , ID_FLOAT)
            (punctuator   , ID_PUNCTUATOR);

        this->self("WS") = white_space;
    }
    lex::token_def<std::string> identifier;
    lex::token_def<lex::omit>   white_space;
    lex::token_def<int>         integer_value;
    lex::token_def<int>         hex_value;
    lex::token_def<double>      float_value;
    lex::token_def<double>      float_value2;
    lex::token_def<>            punctuator;
};

template< typename Iterator, typename Skipper>
struct custom_grammar : qi::grammar<Iterator, Skipper>
{

    template< typename TokenDef >
    custom_grammar(const TokenDef& tok) : custom_grammar::base_type(ges)
    {
        ges = qi::token(ID_INTEGER) | qi::token(ID_FLOAT);
        BOOST_SPIRIT_DEBUG_NODE(ges);
    }
    qi::rule<Iterator, Skipper > ges;
};

int main()
{
    std::string test("1234 56");

    typedef char const* Iterator;
    typedef lex::lexertl::token<Iterator, lex::omit, boost::mpl::true_> token_type;
    typedef lex::lexertl::lexer<token_type> lexer_type;
    typedef qi::in_state_skipper<custom_lexer<lexer_type>::lexer_def> skipper_type;

    typedef custom_lexer<lexer_type>::iterator_type iterator_type;

    custom_lexer<lexer_type> my_lexer; 
    custom_grammar<iterator_type, skipper_type> my_grammar(my_lexer);

    Iterator first = test.c_str();
    Iterator last = &first[test.size()];

    bool r = lex::tokenize_and_phrase_parse(first,last,my_lexer,my_grammar,qi::in_state( "WS" )[ my_lexer.self ]);

    std::cout << std::boolalpha << r << "\n";
    std::cout << "Remaining unparsed: '" << std::string(first,last) << "'\n";
}

Trouble with boost::spirit::lex - punctuation characters

You need an extra escaping level:

my_lexer() : punctuator("\\[|\\]|\\(|\\)|\\.|&>|\\*\\*|\\*|\\+|-|~|!|/|%|<<|>>|<|>|<=|>=|==|!=")

"\\" is a string literal containing one backslash, which the lexer constructor then parses.

Spirit Fails to Parse After only Appearing to get First symbol From the Lexer

Which I assume to mean twenty productions are attempted in the grammar, from the twenty empty [], is that a correct assumption?

No. The [] indicate input tokens.

Also why are the [] empty?

There's likely no useful way to print them so they show up as empty.

If that is the case, is there a way to get the debug statement to print helpful output when using the token enum token as opposed to adding expressions using macros?

I'd think so. But I never use Lex. So, It could be a while to figure it out.

First thing that catches my attention is this:

typedef lex::lexertl::token< char const*, lex::omit, boost::mpl::true_ > token_type;

Your AttributeTypes says omit. Indeed, changing to

typedef lex::lexertl::token< char const*, boost::mpl::vector<lex::omit, std::string>, boost::mpl::true_ > token_type;

Does show signs of life. For the input x=y (no whitespace!) it prints:

Live On Coliru

<start>
  <try>[y][=][x][][][][][][][][][][][][][][][][][]</try>
  <fail/>
</start>
Remaining 

R is false

Now, for the def print_it(x, y): print 3*x + y return fed input, the output still is:

<start>
  <try>[def][][][][][][][][][][][][][][][][][][][]</try>
  <fail/>
</start>
Remaining  print_it(x, y): print 3*x + y return fed

R is false

Slightly more informative. Interestingly, it also seems to fail on the first whitespace. The whiteSpace regex looks okay, so I searched for lex in_state in my answers to remember what I once learned about skipping and Lex.

I played around with some suggestions. Following the second post lead me to this comment:

Nice! You can add to your answer that having a separate state inside the lexer for skipping white spaces has another considerable drawback: lack of debugging support. If you use a separate state for skipping and then use BOOST_SPIRIT_DEBUG_NODE, you won't get the full output of the token stream. This is because the default debug handler advances the lexer iterator to get tokens, the lexer is of course in the INITIAL state. As soon as it meets a white space, the iteration stops, because the lexer cannot find a match. The token stream will be cut at the first white space in the rule's trace. – noxmetus Sep 20 '17 at 18:28

Troubles with Boost::Spirit::Lex & Whitespace