JavaScript Parser in JavaScript

JavaScript parser in JavaScript

Crescent Fresh answered this question in the comments:

JSLint contains a JavaScript parser written in JavaScript. See JSlint by Douglas Crockford Around line 2712 begins the parser. JSLint is written to also handle html so you'd have to gloss over those parts

How does a JavaScript parser work?

Parsers probably work in all sorts of ways, but fundamentally they first go through a stage of tokenisation, then give the result to the compiler, which turns it into a program if it can. For example, given:

function foo(a) {
  alert(a);
}

the parser will remove any leading whitespace to the first character, the letter "f". It will collect characters until it gets something that doesn't belong, the whitespace, that indicates the end of the token. It starts again with the "f" of "foo" until it gets to the "(", so it now has the tokens "function" and "foo". It knows "(" is a token on its own, so that's 3 tokens. It then gets the "a" followed by ")" which are two more tokens to make 5, and so on.

The only need for whitespace is between tokens that are otherwise ambiguous (e.g. there must be either whitespace or another token between "function" and "foo").

Once tokenisation is complete, it goes to the compiler, which sees "function" as an identifier, and interprets it as the keyword "function". It then gets "foo", an identifier that the language grammar tells it is the function name. Then the "(" indicates an opening grouping operator and hence the start of a formal parameter list, and so on.

Compilers may deal with tokens one at a time, or may grab them in chunks, or do all sorts of weird things to make them run faster.

You can also read How do C/C++ parsers work?, which gives a few more clues. Or just use Google.

lightweight javascript to javascript parser

If you want something with a simple interface, you could try node-burrito: https://github.com/substack/node-burrito

It generates an AST using the uglify-js parser and then recursively walks the nodes. All you have to do is give a single callback which tests each node. You can alter the ones you need to change, and it outputs the resulting code.

Javascript parser for Java

From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java

The grammar below is a context-free representation of the grammar this
parser parses. It disagrees with EcmaScript 262 Edition 3 (ES3) where
implementations disagree with ES3. The rules for semicolon insertion and
the possible backtracking in expressions needed to properly handle
backtracking are commented thoroughly in code, since semicolon insertion
requires information from both the lexer and parser and is not determinable
with finite lookahead.

Noteworthy features

Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.

Does not parse Firefox style catch (<Identifier> if <Expression>) since those don't work on IE and many other interpreters.

Recognizes const since many interpreters do (not IE) but warns.

Allows, but warns, on trailing commas in Array and Object constructors.

Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.

To parse strict code, pass in a PedanticWarningMessageQueue that
converts MessageLevel#WARNING and above to MessageLevel#FATAL_ERROR.

CajaTestCase.js shows how to set up a parser, and [fromResource] and [fromString] in the same class show how to get an input of the right kind.

Parse an HTML string with JS

Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

Edit: adding a jQuery answer to please the fans!

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

How to parse Javascript code in Javascript to get variable names not in current scope?

I suggest using some parser like UglifyJS or esprima to parse the code, extract the variables and then test them in the current environment. I don't think that's very difficult to do that since I've built quite some JavaScript code analysis or compilation/evaluation.

But if you need a easier and faster solution, you can also try the JSLint. We can use it as a library, just disable other rules and leave only the "undefined variable". We can test each of the errors in the current environment to see whether they're provided or not.

A JavaScript parser for DOM

You can leverage the current document without appending any nodes to it.

Try something like this:

function toNode(html) {
    var doc = document.createElement('html');
    doc.innerHTML = html;
    return doc;
}

var node = toNode('<html><head><title> This is the old title. </title></head></html>');

console.log(node);

http://jsfiddle.net/6SvqA/3/

Parsing variable types from Strings in javaScript

Your code throws an error on the types that are null. JSON.parse() only parses valid json strings. Those that it was able to match are all valid types in json.

https://tc39.es/ecma262/#sec-json.parse

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse

That said, the json strings will not work:

JSON.parse('"document"') 
// variable string
JSON.parse("\""+ c + "\"")

Sample Image

src: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval

var valueB2 =  Function("return " + c)();

JavaScript Parser in JavaScript