Indentation sensitive parser using Parslet in Ruby?
There are a few approaches.
Parse the document by recognising each line as a collection of indents and an identifier, then apply a transformation afterwards to reconstruct the hierarchy based on the number of indents.
Use captures to store the current indent and expect the next node to include that indent plus more to match as a child (I didn't dig into this approach much as the next one occurred to me)
Rules are just methods. So you can define 'node' as a method, which means you can pass parameters! (as follows)
This lets you define node(depth)
in terms of node(depth+1)
. The problem with this approach, however, is that the node
method doesn't match a string, it generates a parser. So a recursive call will never finish.
This is why dynamic
exists. It returns a parser that isn't resolved until the point it tries to match it, allowing you to now recurse without problems.
See the following code:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
This is my favoured solution.
Create a sepBy parser combinator sensitive to the indentation of the first parser
I think there are two problems here:
- You're requiring the first
"Example"
to be indented beyond itself, which is impossible. You should instead let the first parser succeed regardless of the current position. greater
is not atomic, so when it fails, your parser is left in an invalid state. This might or might not be considered a bug in the library. In any case, you can make it atomic viaattempt
.
With that in mind, I think the following parser does roughly what you want:
let indentSepBy p sep =
parse {
let! pos = getPosition
let! head = p
let! tail =
let p' = attempt (greater pos p)
let sep' = attempt (greater pos sep)
many (sep' >>. p')
return head :: tail
}
You can test this as follows:
let test =
indentSepBy (pstring "Example") (pchar '.')
let run text =
printfn "***"
runParser (test .>> eof) () text
|> printfn "%A"
[<EntryPoint>]
let main argv =
run "Example.Example" // success
run "Example\n.Example" // failure
run "Example\n .Example" // success
0
Note that I've forced the test
parser to consume the entire input via eof
. Otherwise, it will falsely report success when it can't in fact parse the full string.
Parsing text structured as tree with fixed width columns using parslet in ruby
I was going to say the same thing as "the Tin Man". There has to be another format you can generate the data in.
If you want to parse this however... Parslet works like a map/reduce algorythm.
You're first pass (parsing) is not intended to give you your final output, just to capture all the information you need from your source document.
Once you have that stored in a tree, you can then transform it to get the output you want.
So... I would write a parser that records each white space as a node, aswell as matching the text and percentages you need. I would group the white space nodes in an "indentation" node.
I would then use a transform to replace the whitespace nodes with a count of nodes to calculate the indentations.
Remember: Parslet generates a standard ruby hash. You can then write whatever code you like to make sense of this tree.
The parser is just converting the text file into a data-stucture you can manipulate.
Just to reiterate though. I think "the Tin Man" has the right answer.. generate the data in a machine readable way instead.
Update:
For an alternative approach you can check out: Indentation sensitive parser using Parslet in Ruby?
Parse markdown indented code block
In this case newline can be eof. In which case newline. repeat(2) repeatedly matches eof. You want "repeat(2,2)". You can made these bugs easy to find :)... Just use my fork.
You can detect how it's looping by using my fork of parslet.
It catches loops and tells you what's happening.
It's slower than the usual parslet, so switch back for production parsing.
Use this Gemfile:
source "https://rubygems.org"
gem "parslet" , :git => "https://github.com/NigelThorne/parslet.git"
gem 'rspec'
And you get these results:
9:23:40.20 > bundle exec rspec parser.rb
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
Failures:
1) RecurringGroupParser should parse a
Failure/Error: is_expected.to parse "a"
RuntimeError:
Grammar contains an infinite loop applying 'NEWLINE{2, }' at char position 1
...a<-- here
# ./parser.rb:33:in `block (2 levels) in <top (required)>'
2) RecurringGroupParser should parse aa
Failure/Error: is_expected.to parse "a\na"
RuntimeError:
Grammar contains an infinite loop applying 'NEWLINE{2, }' at char position 3
...a
a<-- here
# ./parser.rb:37:in `block (2 levels) in <top (required)>'
3) RecurringGroupParser should parse aaa
Failure/Error: is_expected.to parse "a\na\na"
RuntimeError:
Grammar contains an infinite loop applying 'NEWLINE{2, }' at char position 5
...a
a
a<-- here
# ./parser.rb:41:in `block (2 levels) in <top (required)>'
4) RecurringGroupParser should parse a a
Failure/Error: is_expected.to parse "a\n\na"
expected BLOCK to be able to parse "a\n\na"
# ./parser.rb:45:in `block (2 levels) in <top (required)>'
5) RecurringGroupParser should parse aa a
Failure/Error: is_expected.to parse "a\na\n\na"
expected BLOCK to be able to parse "a\na\n\na"
# ./parser.rb:49:in `block (2 levels) in <top (required)>'
6) RecurringGroupParser should parse aaa a
Failure/Error: is_expected.to parse "a\naa\n\na"
expected BLOCK to be able to parse "a\naa\n\na"
# ./parser.rb:53:in `block (2 levels) in <top (required)>'
7) RecurringGroupParser should parse a aa
Failure/Error: is_expected.to parse "a\n\na\na"
expected BLOCK to be able to parse "a\n\na\na"
# ./parser.rb:57:in `block (2 levels) in <top (required)>'
8) RecurringGroupParser should parse a aaa
Failure/Error: is_expected.to parse "a\n\na\na\na"
expected BLOCK to be able to parse "a\n\na\na\na"
# ./parser.rb:61:in `block (2 levels) in <top (required)>'
9) RecurringGroupParser should parse aa a
Failure/Error: is_expected.to parse "a\na\n\na"
expected BLOCK to be able to parse "a\na\n\na"
# ./parser.rb:65:in `block (2 levels) in <top (required)>'
10) RecurringGroupParser should parse aa aa
Failure/Error: is_expected.to parse "a\na\n\na\na"
expected BLOCK to be able to parse "a\na\n\na\na"
# ./parser.rb:69:in `block (2 levels) in <top (required)>'
11) RecurringGroupParser should parse aa aaa
Failure/Error: is_expected.to parse "a\na\n\na\na\na"
expected BLOCK to be able to parse "a\na\n\na\na\na"
# ./parser.rb:73:in `block (2 levels) in <top (required)>'
12) RecurringGroupParser should parse aaa aa
Failure/Error: is_expected.to parse "a\naa\n\na\na"
expected BLOCK to be able to parse "a\naa\n\na\na"
# ./parser.rb:77:in `block (2 levels) in <top (required)>'
13) RecurringGroupParser should parse aaa aaa
Failure/Error: is_expected.to parse "a\naa\n\na\na\na"
expected BLOCK to be able to parse "a\naa\n\na\na\na"
# ./parser.rb:81:in `block (2 levels) in <top (required)>'
14) RecurringGroupParser should parse a a a
Failure/Error: is_expected.to parse "a\n\na\n\na"
expected BLOCK to be able to parse "a\n\na\n\na"
# ./parser.rb:85:in `block (2 levels) in <top (required)>'
15) RecurringGroupParser should parse aa a a
Failure/Error: is_expected.to parse "a\na\n\na\n\na"
expected BLOCK to be able to parse "a\na\n\na\n\na"
# ./parser.rb:89:in `block (2 levels) in <top (required)>'
16) RecurringGroupParser should parse aaa a a
Failure/Error: is_expected.to parse "a\naa\n\na\n\na"
expected BLOCK to be able to parse "a\naa\n\na\n\na"
# ./parser.rb:93:in `block (2 levels) in <top (required)>'
17) RecurringGroupParser should parse a aa a
Failure/Error: is_expected.to parse "a\n\na\na\n\na"
expected BLOCK to be able to parse "a\n\na\na\n\na"
# ./parser.rb:97:in `block (2 levels) in <top (required)>'
18) RecurringGroupParser should parse aa aa a
Failure/Error: is_expected.to parse "a\na\n\na\na\n\na"
expected BLOCK to be able to parse "a\na\n\na\na\n\na"
# ./parser.rb:101:in `block (2 levels) in <top (required)>'
19) RecurringGroupParser should parse aaa aa a
Failure/Error: is_expected.to parse "a\naa\n\na\na\n\na"
expected BLOCK to be able to parse "a\naa\n\na\na\n\na"
# ./parser.rb:105:in `block (2 levels) in <top (required)>'
20) RecurringGroupParser should parse a aaa a
Failure/Error: is_expected.to parse "a\n\na\naa\n\na"
expected BLOCK to be able to parse "a\n\na\naa\n\na"
# ./parser.rb:109:in `block (2 levels) in <top (required)>'
21) RecurringGroupParser should parse aa aaa a
Failure/Error: is_expected.to parse "a\na\n\na\naa\n\na"
expected BLOCK to be able to parse "a\na\n\na\naa\n\na"
# ./parser.rb:113:in `block (2 levels) in <top (required)>'
22) RecurringGroupParser should parse aaa aaa a
Failure/Error: is_expected.to parse "a\naa\n\na\naa\n\na"
expected BLOCK to be able to parse "a\naa\n\na\naa\n\na"
# ./parser.rb:117:in `block (2 levels) in <top (required)>'
23) RecurringGroupParser should parse a a aa
Failure/Error: is_expected.to parse "a\n\na\n\na\na"
expected BLOCK to be able to parse "a\n\na\n\na\na"
# ./parser.rb:121:in `block (2 levels) in <top (required)>'
24) RecurringGroupParser should parse aa a aa
Failure/Error: is_expected.to parse "a\na\n\na\n\na\na"
expected BLOCK to be able to parse "a\na\n\na\n\na\na"
# ./parser.rb:125:in `block (2 levels) in <top (required)>'
25) RecurringGroupParser should parse aaa a aa
Failure/Error: is_expected.to parse "a\naa\n\na\n\na\na"
expected BLOCK to be able to parse "a\naa\n\na\n\na\na"
# ./parser.rb:129:in `block (2 levels) in <top (required)>'
26) RecurringGroupParser should parse a aa aa
Failure/Error: is_expected.to parse "a\n\na\na\n\na\na"
expected BLOCK to be able to parse "a\n\na\na\n\na\na"
# ./parser.rb:133:in `block (2 levels) in <top (required)>'
27) RecurringGroupParser should parse aa aa aa
Failure/Error: is_expected.to parse "a\na\n\na\na\n\na\na"
expected BLOCK to be able to parse "a\na\n\na\na\n\na\na"
# ./parser.rb:137:in `block (2 levels) in <top (required)>'
28) RecurringGroupParser should parse aaa aa aa
Failure/Error: is_expected.to parse "a\naa\n\na\na\n\na\na"
expected BLOCK to be able to parse "a\naa\n\na\na\n\na\na"
# ./parser.rb:141:in `block (2 levels) in <top (required)>'
29) RecurringGroupParser should parse a aaa aa
Failure/Error: is_expected.to parse "a\n\na\naa\n\na\na"
expected BLOCK to be able to parse "a\n\na\naa\n\na\na"
# ./parser.rb:145:in `block (2 levels) in <top (required)>'
30) RecurringGroupParser should parse aa aaa aa
Failure/Error: is_expected.to parse "a\na\n\na\naa\n\na\na"
expected BLOCK to be able to parse "a\na\n\na\naa\n\na\na"
# ./parser.rb:149:in `block (2 levels) in <top (required)>'
31) RecurringGroupParser should parse aaa aaa aa
Failure/Error: is_expected.to parse "a\naa\n\na\naa\n\na\na"
expected BLOCK to be able to parse "a\naa\n\na\naa\n\na\na"
# ./parser.rb:153:in `block (2 levels) in <top (required)>'
32) RecurringGroupParser should parse a a aaa
Failure/Error: is_expected.to parse "a\n\na\n\na\na\na"
expected BLOCK to be able to parse "a\n\na\n\na\na\na"
# ./parser.rb:157:in `block (2 levels) in <top (required)>'
33) RecurringGroupParser should parse aa a aaa
Failure/Error: is_expected.to parse "a\na\n\na\n\na\na\na"
expected BLOCK to be able to parse "a\na\n\na\n\na\na\na"
# ./parser.rb:161:in `block (2 levels) in <top (required)>'
34) RecurringGroupParser should parse aaa a aaa
Failure/Error: is_expected.to parse "a\naa\n\na\n\na\na\na"
expected BLOCK to be able to parse "a\naa\n\na\n\na\na\na"
# ./parser.rb:165:in `block (2 levels) in <top (required)>'
35) RecurringGroupParser should parse a aa aaa
Failure/Error: is_expected.to parse "a\n\na\na\n\na\na\na"
expected BLOCK to be able to parse "a\n\na\na\n\na\na\na"
# ./parser.rb:169:in `block (2 levels) in <top (required)>'
36) RecurringGroupParser should parse aa aa aaa
Failure/Error: is_expected.to parse "a\na\n\na\na\n\na\na\na"
expected BLOCK to be able to parse "a\na\n\na\na\n\na\na\na"
# ./parser.rb:173:in `block (2 levels) in <top (required)>'
37) RecurringGroupParser should parse aaa aa aaa
Failure/Error: is_expected.to parse "a\naa\n\na\na\n\na\na\na"
expected BLOCK to be able to parse "a\naa\n\na\na\n\na\na\na"
# ./parser.rb:177:in `block (2 levels) in <top (required)>'
38) RecurringGroupParser should parse a aaa aaa
Failure/Error: is_expected.to parse "a\n\na\naa\n\na\na\na"
expected BLOCK to be able to parse "a\n\na\naa\n\na\na\na"
# ./parser.rb:181:in `block (2 levels) in <top (required)>'
39) RecurringGroupParser should parse aa aaa aaa
Failure/Error: is_expected.to parse "a\na\n\na\naa\n\na\na\na"
expected BLOCK to be able to parse "a\na\n\na\naa\n\na\na\na"
# ./parser.rb:185:in `block (2 levels) in <top (required)>'
40) RecurringGroupParser should parse aaa aaa aaa
Failure/Error: is_expected.to parse "a\naa\n\na\naa\n\na\na\na"
expected BLOCK to be able to parse "a\naa\n\na\naa\n\na\na\na"
# ./parser.rb:189:in `block (2 levels) in <top (required)>'
Finished in 0.01702 seconds (files took 0.26725 seconds to load)
40 examples, 40 failures
see this question on parsing indentation with Parslet.
Parslet : exclusion clause
You can do something like this:
rule(:word) { match['^")(\\s'].repeat(1) } # normal word
rule(:op) { str('AND') | str('OR') | str('NOT') }
rule(:keyword) { str('all:') | str('any:') }
rule(:searchterm) { keyword.absent? >> op.absent? >> word }
In this case, the absent?
does a lookahead to make sure the next token is not a keyword; if not, then it checks to make sure it's not an operator; if not, finally see if it's a valid word
.
An equivalent rule would be:
rule(:searchterm) { (keyword | op).absent? >> word }
Ruby:parslet for a system verilog interface parser
Ok... this parses the file you mentioned. I don't understand the desired format so I can't say it will work for all your files, but hopefully this will get you started.
require 'parslet'
class MyParse < Parslet::Parser
rule(:lparen) { space? >> str('(') }
rule(:rparen) { space? >> str(')') }
rule(:lbox) { space? >> str('[') }
rule(:rbox) { space? >> str(']') }
rule(:lcurly) { space? >> str('{') }
rule(:rcurly) { space? >> str('}') }
rule(:comma) { space? >> str(',') }
rule(:semicolon) { space? >> str(';') }
rule(:eof) { any.absent? }
rule(:space) { match["\t\s"] }
rule(:whitespace) { space.repeat(1) }
rule(:space?) { space.repeat(0) }
rule(:blank_line) { space? >> newline.repeat(1) }
rule(:newline) { str("\n") }
# Things
rule(:integer) { space? >> match('[0-9]').repeat(1).as(:int) >> space? }
rule(:identifier) { match['a-z'].repeat(1) }
def line( expression )
space? >>
expression >>
space? >>
str(';') >>
space? >>
str("\n")
end
rule(:expression?) { ( interface ).repeat(0) }
rule(:interface) { intf_start >> interface_body.repeat(0) >> intf_end }
rule(:interface_body) {
intf_end.absent? >>
interface_bodyline >>
blank_line.repeat(0)
}
rule(:intf_start) {
line (
str('interface') >>
space? >>
( match['a-zA-Z_'].repeat(1,1) >>
match['[:alnum:]_'].repeat(0)).as(:intf_name)
)
}
rule(:interface_bodyline) {
line ( protocol | transmit )
}
rule(:protocol) {
str('protocol') >> whitespace >>
(str('validonly').maybe).as(:protocol)
}
rule(:transmit) {
str('transmit') >> whitespace >>
(bool | transmit_width) >> whitespace >>
name.as(:transmit_name)
}
rule(:name) {
match('[a-zA-Z_]') >>
(match['[:alnum:]'] | str("_")).repeat(0)
}
rule(:bool) { lbox >> str('Bool').as(:bool) >> rbox }
rule(:transmit_width) {
lbox >>
space? >>
match('[0-9]').repeat(1).as(:msb) >>
space? >>
str(':') >>
space? >>
match('[0-9]').repeat(1).as(:lsb) >>
space? >>
rbox
}
rule(:intf_end) { str('endinterface') }
root :expression?
end
require 'rspec'
require 'parslet/rig/rspec'
RSpec.describe MyParse do
let(:parser) { MyParse.new }
context "simple_rule" do
it "should consume protocol line" do
expect(parser.interface_bodyline).to parse(' protocol validonly;
')
end
it 'name' do
expect(parser.name).to parse('valid')
end
it "bool" do
expect(parser.bool).to parse('[Bool]')
end
it "transmit line" do
expect(parser.transmit).to parse('transmit [Bool] valid')
end
it "transmit as bodyline'" do
expect(parser.interface_bodyline).to parse(' transmit [Bool] valid;
')
end
end
end
RSpec::Core::Runner.run(['--format', 'documentation'])
begin
doc = File.read("test.txt")
MyParse.new.parse(doc)
rescue Parslet::ParseFailed => error
puts error.cause.ascii_tree
end
The main changes...
Don't consume whitespace both side of your tokens.
You had expressions that parsed "[Bool] valid" as LBOX BOOL RBOX SPACE? then expected another WHITESPACE but couldn't find one (as the previous rule had consumed it).When an expression can validly parse as a zero length (e.g. something with repeat(0)) and there is a problem with who it's written, then you get an odd error. The rule pass and match nothing, then the next rule will typically fail. I explicitly matched 'body lines' as 'not the end line' so it would fail with an error.
'repeat' defaults to (0) which I would love to change. I see mistakes around this all the time.
x.repeat(1,1) means make one match. That's the same as having x. :)
there were more whitespace problems
so....
Write your parser from the top down. Write tests from the bottom up.
When your tests get to the top you are done! :)
Good luck.
SystemStackError: when parsing SCIM 2.0 filter query using Parslet
Always consume something before you recurse.
For example:
Don't define a list of numbers as
NumList = NumList >> "," >> NumList | Number
Defined it as
NumList = Number >> ("," >> NumList).maybe
or even
NumList = Number >> ("," >> Number).repeat(0)
So for logican_expression...
# logExp = FILTER SP ("and" / "or") SP FILTER
rule(:logical_expression) do
filter >> space >> (and_op | or_op) >> space >> filter
end
You need the first filter
to be something that can't be a logical_expression.
# FILTER = attrExp / logExp / valuePath / *1"not" "(" FILTER ")"
rule(:filter) do
logical_expression | filter_atom
end
rule(:filter_atom) do
(not_op? >> lparen >> filter >> rparen) | attribute_expression | value_path
end
# logExp = FILTER SP ("and" / "or") SP FILTER
rule(:logical_expression) do
filter_atom >> space >> (and_op | or_op) >> space >> filter
end
Fast and reliable way to find out if a source code file implements an interface
I would say that parsing a source code file is both, fast and reliable. But "fast" is such a vague notion that it really depends, I guess. I wouldn't expect too much overhead when compared to scanning the source file for occurrences of the words "implements", though -- thus if the latter is fine for you, I'd assume the former should be acceptable too?
The javax.tools.* API would be a good entrance point to get started; however, there are also a number of (open source) source code parsers for Java out there.
Also, here is an introductory blog post on Oracle's websites.
RESTful web services and HTTP verbs
Yes, you can live without PUT and DELETE.
This article tells you why:
http://www.artima.com/lejava/articles/why_put_and_delete.html
While to true RESTafrians this may be heresy, in the real world you do what you can, with what you have. Be as rational as you can and as consistent with your own convention as you can, but you can definitely build a good RESTful system without P and D.
rp
Related Topics
When We Import CSV Data, How Eliminate "Invalid Byte Sequence in Utf-8"
Cannot Access Local Sinatra Server from Another Computer on Same Network
How to Change the Position of an Array Element
Monkey-Patching VS. S.O.L.I.D. Principles
Phusion Passenger Error: You Have Activated Rack 1.2.1, But Your Gemfile Requires Rack 1.2.2
Should I Define a Main Method in My Ruby Scripts
How to Convert a JSON Formatted Key Value Pair to Ruby Hash with Symbol as Key
Ruby - Can't Modify Frozen String (Typeerror)
Can't Install Ffi -V '1.9.18' on MACos Catalina
How to Check the Gem Version in Ruby at Runtime
How Much Performance Do You Get Out a Heroku Dynos/Workers
What Does ':Location => ...' and 'Head :Ok' Mean in the 'Respond_To' Format Statement
Jekyll on Windows: Pygments Not Working
Engine's Assets with Rails 3.1