Which Parsers Are Available For Parsing C# Code

Which parsers are available for parsing C# code?

Works on source code:

  • CSParser:
    From C# 1.0 to 2.0, open-source
  • Metaspec C# Parser:
    From C# 1.0 to 3.0, commercial product (about 5000$)
  • #recognize!:
    From C# 1.0 to 3.0, commercial product (about 900€) (answer by SharpRecognize)
  • SharpDevelop Parser (answer by Akselsson)
  • NRefactory:
    From C# 1.0 to 4.0 (+async), open-source, parser used in SharpDevelop. Includes semantic analysis.
  • C# Parser and CodeDOM:
    A complete C# 4.0 Parser, already support the C# 5.0 async feature. Commercial product (49$ to 299$) (answer by Ken Beckett)
  • Microsoft Roslyn CTP:
    Compiler as a service.

Works on assembly:

  • System.Reflection
  • Microsoft Common Compiler Infrastructure:
    From C# 1.0 to 3.0, Microsoft Public License. Used by Fxcop and Spec#
  • Mono.Cecil:
    From C# 1.0 to 3.0, open-source

The problem with assembly "parsing" is that we have less informations about line and file (the informations is based on .pdb file, and Pdb contains lines informations only for methods)

I personnaly recommend Mono.Cecil and NRefactory.

Looking for a C# code parser

While .NET's CodeDom namespace provides the basic API for code language parsers, they are not implemented. Visual Studio does this through its own language services. These are not available in the redistributable framework.

You could either...

  1. Compile the code then use reflection on the resulting assembly
  2. Look at something like the Mono C# compiler which creates these syntax trees. It won't be a high-level API like CodeDom but maybe you can work with it.

There may be something on CodePlex or a similar site.

UPDATE

See this related post. Parser for C#

Code parsing C#

There are two basic aproaches:

1) Parse the entire solution and everything it references so you understand all the types involved in the code

2) Parse locally and do your best to guess what types etc are.

The trouble with (2) is that you have to guess, and in some circumstances you just can't tell from a code snippet exactly what everything is. But if you're happy with the sort oif syntax highlighting shown on (e.g.) Stack Overflow, then this approach is easy and quite effective.

To do (1) then you need to do one of (in decreasing order of difficulty):

  • Parse all the source code. Not possible if you reference 3rd party assemblies.
  • Use reflection on the compiled code to garner type information you can use when parsing the source.
  • Use the host IDE's (if avaiable - so not applicable in your case!) code element interfaces to provide the information you need

Library for parsing/modifying C# source code (and compiling after that)

Duplicate of question Parser for C#.

Your best bet may be one of these:

http://csparser.codeplex.com/

http://wiki.sharpdevelop.net/NRefactory.ashx

How to write a Parser in C#?

I have implemented several parsers in C# - hand-written and tool generated.

A very good introductory tutorial on parsing in general is Let's Build a Compiler - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.

You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. TDOP parsing or Monadic Parsing).

On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religious - however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.

Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).

Implementing a Top Down Parser in C#

After the book you can also find interesting to read about a compiler generator as ANTLR that can help you to write the compiler ( also in C# ) and browsing the AST even visually.

C Code Parser for .NET

ANTLR can do what you'd like. It has a C preprocessor and ANSI C grammar.

https://github.com/antlr/grammars-v4

Parser for C# code to evaluate testability?

If you can use prerelease code, you might want to check out roslyn, i.e. "compiler as a service":

Traditionally, compilers are black boxes – source code goes in one end and object files or assemblies come out the other end. The Roslyn [project] changes that model by opening up the Visual Basic and C# compilers as APIs. These APIs allow tools and end-users to share in the wealth of information the compilers have about code.

Mind you, however, that interpreting what you get (a syntax tree) might still be a lot of work.



Related Topics



Leave a reply



Submit