Learning Treetop

Learning Treetop

Sadly, Treetop's documentation sucks. A lot. And the examples on the website aren't helpful. I found that dzone has a pretty large collection of treetop grammars :

Treetop grammars

Custom Methods for Treetop Syntax Nodes

This is a significant weakness in the design of Treetop.

I (as maintainer) didn't want to slow it down further by
passing yet another argument to every SyntaxNode,
and break any custom SyntaxNode classes folk have
written. These constructors get the "input" object, a Range
that selects part of that input, and optionally an array
of child SyntaxNodes. They should have received the
Parser itself instead of the input as a member.

So instead, for my own use (some years back), I made
a custom proxy for the "input" and attached my Context
to it. You might get away with doing something similar:

https://github.com/cjheath/activefacts-cql/blob/master/lib/activefacts/cql/parser.rb#L203-L249

Ruby Treetop how to include everything that does not match the grammar

It's a common idiom in PEG grammars to repeatedly match any character . that isn't part of a rule !body. Something like this:

rule bodies
((!body .)* body)+ (!body .)*
end

Simplest treetop grammar is returning a parse error, just learning

AFAIK, treetop starts parsing with the first rule in your grammar (the rule word, in your case!). Now, if you input is 'John Smith' (i.e.: word, s, word), it stops parsing after matching the rule word for the first time. And produces an error when it encounters the first s since word does not match s.

You need to add a rule to the top of your grammar that describes an entire name: that is a word, followed by a space followed by a word, etc.

grammar FullName

rule name
word (s word)* {
def value
text_value
end
}
end

rule word
[^\s]+ {
def value
text_value
end
}
end

rule s
[\s]+ {
def value
text_value
end
}
end

end

A quick test with the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'treetop'
require 'polyglot'
require 'FullName'

parser = FullNameParser.new
name = parser.parse('John Smith').value
print name

will print:

John Smith

Treetop Grammar does not recognize /

I found this issue: https://github.com/nathansobo/treetop/issues/25, and it appears to have answered my question.

My grammar did not contain a top level rule that would allow an opening or closing tag, therefore the second possibility was not even considered:

grammar BBCode
rule document
(open_tag / close_tag)
end

rule open_tag
("[" tag_name "]")
end

rule tag_name
[a-zA-Z\*]+
end

rule close_tag
("[/" tag_name "]")
end
end

Rule's order does matter in TreeTop?

I think I just figured out where is wrong!!! There should be a top rule that includes other rules, which is placed as the first rule:

grammar Fortran
rule statement
( id / integer )* {
def content
elements.map { |e| e.content }
end
}
end

rule id
[a-zA-Z] [a-zA-Z0-9]* {
def content
[:id, text_value]
end
}
end

rule integer
[1-9] [0-9]* {
def content
[:integer, text_value]
end
}
end
end

parser = FortranParser.new
ast = parser.parse('1')

Then the result is

[[:integer, "1"]]

Writing Treetop rule to parse input in any order

Here's an example of parsing in any order. The only trouble is you would have to handle duplicates by hand since Treetop doesn't have a rule for unordered-non-repeating elements.

rule top
((gender / age_under) ' '?)*
end

rule gender
'women' / 'men'
end

rule age_under
'under ' age
end

rule age
[0-9]+
end

Treetop ignore grammar rules

Like Jörg mentioned, you need to use your comma and space rules in the grammar. I built a simple example of what I think you're trying to accomplish below. It should match "100", "1,000", "1,000,000", etc.

If you look at the numeric rule, first I test for a subtraction sign '-'?, then I test for one to three digits, then I test for zero or more combinations of comma's and three digits.

require 'treetop'
Treetop.load_from_string DATA.read

parser = PovParser.new

p parser.parse('1,000,000')

__END__
grammar Pov
rule numeric
'-'? digit 1..3 (comma space* (digit 3..3))*
end

rule digit
[0-9]
end

rule comma
','
end

rule space
[\s]
end
end


Related Topics



Leave a reply



Submit