Learning Treetop

Sadly, Treetop's documentation sucks. A lot. And the examples on the website aren't helpful. I found that dzone has a pretty large collection of treetop grammars :

Treetop grammars

Custom Methods for Treetop Syntax Nodes

This is a significant weakness in the design of Treetop.

I (as maintainer) didn't want to slow it down further by
passing yet another argument to every SyntaxNode,
and break any custom SyntaxNode classes folk have
written. These constructors get the "input" object, a Range
that selects part of that input, and optionally an array
of child SyntaxNodes. They should have received the
Parser itself instead of the input as a member.

So instead, for my own use (some years back), I made
a custom proxy for the "input" and attached my Context
to it. You might get away with doing something similar:

https://github.com/cjheath/activefacts-cql/blob/master/lib/activefacts/cql/parser.rb#L203-L249

Ruby Treetop how to include everything that does not match the grammar

It's a common idiom in PEG grammars to repeatedly match any character . that isn't part of a rule !body. Something like this:

rule bodies
  ((!body .)* body)+ (!body .)*
end

Simplest treetop grammar is returning a parse error, just learning

AFAIK, treetop starts parsing with the first rule in your grammar (the rule word, in your case!). Now, if you input is 'John Smith' (i.e.: word, s, word), it stops parsing after matching the rule word for the first time. And produces an error when it encounters the first s since word does not match s.

You need to add a rule to the top of your grammar that describes an entire name: that is a word, followed by a space followed by a word, etc.

grammar FullName

  rule name
    word (s word)* {
      def value
        text_value
      end
    }
  end

  rule word
    [^\s]+ {
      def value
        text_value
      end
    }
  end

  rule s
    [\s]+ {
      def value
        text_value
      end
    }
  end

end

A quick test with the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'FullName'

parser = FullNameParser.new
name = parser.parse('John Smith').value
print name

will print:

John Smith

Treetop Grammar does not recognize /

I found this issue: https://github.com/nathansobo/treetop/issues/25, and it appears to have answered my question.

My grammar did not contain a top level rule that would allow an opening or closing tag, therefore the second possibility was not even considered:

grammar BBCode
  rule document
    (open_tag / close_tag)
  end

  rule open_tag
    ("[" tag_name "]")
  end

  rule tag_name
    [a-zA-Z\*]+
  end

  rule close_tag
    ("[/" tag_name "]")
  end
end

Rule's order does matter in TreeTop?

I think I just figured out where is wrong!!! There should be a top rule that includes other rules, which is placed as the first rule:

grammar Fortran
    rule statement
        ( id / integer )* {
            def content
                elements.map { |e| e.content }
            end
        }
    end

    rule id
        [a-zA-Z] [a-zA-Z0-9]* {
            def content
                [:id, text_value]
            end
        }
    end

    rule integer
        [1-9] [0-9]* {
            def content
                [:integer, text_value]
            end
        }
    end
end

parser = FortranParser.new
ast = parser.parse('1')

Then the result is

[[:integer, "1"]]

Writing Treetop rule to parse input in any order

Here's an example of parsing in any order. The only trouble is you would have to handle duplicates by hand since Treetop doesn't have a rule for unordered-non-repeating elements.

rule top
 ((gender / age_under) ' '?)*
end

rule gender
 'women' / 'men'
end

rule age_under
 'under ' age
end

rule age
 [0-9]+
end

Treetop ignore grammar rules

Like Jörg mentioned, you need to use your comma and space rules in the grammar. I built a simple example of what I think you're trying to accomplish below. It should match "100", "1,000", "1,000,000", etc.

If you look at the numeric rule, first I test for a subtraction sign '-'?, then I test for one to three digits, then I test for zero or more combinations of comma's and three digits.

require 'treetop'
Treetop.load_from_string DATA.read

parser = PovParser.new

p parser.parse('1,000,000')

__END__
grammar Pov
   rule numeric
      '-'? digit 1..3 (comma space* (digit 3..3))*
   end

   rule digit
      [0-9]
   end

   rule comma
      ','
   end

   rule space
      [\s]
   end
end

Learning Treetop