Lex/Yacc for C#

Lex/Yacc for C#?

I'm not sure Lex/Yacc will be of any help. You'll just need a basic tokenizer and an interpreter which are faster to write by hand. If you're still into parsing route see Irony.

As a sidenote: have you considered PowerShell and its commandlets?

Does C# have (direct) flex/yacc port? Or what lexer/parser people use for C#?

I think your best bet is going to be GPLEX/GPPG, it's the closest thing to Yacc/Lex for C# that I know of, and you will need to port your actions into C# regardless.

I have also used Coco/R, ANTLR (of course), and have more recently played with Irony.net, fslex/fsyacc (F#), and fparsec (F#).

Here are some links

Fparsec

Coco/R

Irony.net

Gardens Point Parser Generator

Gardens Point Lex

I don't have a technical reason for using one versus another: I play around with these mostly for fun. I did create some DSLs for work projects a good number of years ago, but I hand rolled the scanners/parsers on those (back then I was working mostly in Pascal, and I found that TP Lex/Yacc did not suit my tastes, and the DSLs were simple enough). I have found that FParsec and Irony suit my tastes the best, as I find the other somewhat "messy" (lacking in elegance).

Parser builders for C#/.NET

If you really want to stay in C#, I would recommend using the Irony toolkit - it allows you to specify grammars in C# code.

Lex Yacc, should i tokenize character literals?

You need a lex rule to return the punctuation tokens 'as-is' so that the yacc grammar can recognize them. Something like:

[()]        { return *yytext; }

added to your second example should do the trick.

Reasons for using lex/yacc alternatives?

What you also should consider is that the various parser generators generate quite different parsers. Yacc/bison produces bottom-up parsers which are often hard to understand, hard to debug and give weird error messages. ANTLR for instance produces a recursive descent top-down parser which is much easier to understand, you can actually debug it easily, you can only use subrules for a parse operation (e.g. just parse expressions instead of the full language).

Additionally, its error recovery is way better and produces a lot cleaner errors. There are various IDEs/plugins/extensions that make working with ANTLR grammars pretty easy (ANTLRWorks, the IntelliJ plugin, the Visual Studio Code extension etc.). And you can generate parsers in different languages (C, C++, C#, Java and more) from the same grammar (unless you have language specific actions in your grammar, you mentioned this in your question already). And while we speak of actions: due to the evaluation principle in bottom parser (shift token, shift token, reduce them to a new token and shift it etc.) actions can easily cause trouble there, e.g. executing more than once and such. Not so with parsers generated by ANTLR.

I also tried various parser generators over the years, even wrote my own, but I would anytime recommend ANTLR as the tool of choice.



Related Topics



Leave a reply



Submit