Javascript parser for Java
From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java
The grammar below is a context-free representation of the grammar this
parser parses. It disagrees with EcmaScript 262 Edition 3 (ES3) where
implementations disagree with ES3. The rules for semicolon insertion and
the possible backtracking in expressions needed to properly handle
backtracking are commented thoroughly in code, since semicolon insertion
requires information from both the lexer and parser and is not determinable
with finite lookahead.Noteworthy features
- Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.
- Does not parse Firefox style
catch (<Identifier> if <Expression>)
since those don't work on IE and many other interpreters.- Recognizes
const
since many interpreters do (not IE) but warns.- Allows, but warns, on trailing commas in
Array
andObject
constructors.- Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.
To parse strict code, pass in a
PedanticWarningMessageQueue
that
convertsMessageLevel#WARNING
and above toMessageLevel#FATAL_ERROR
.
CajaTestCase.js
shows how to set up a parser, and [fromResource
] and [fromString
] in the same class show how to get an input of the right kind.
Parsing HTML page containing JS in Java
Selenium's Webdriver is fantastic: http://docs.seleniumhq.org/docs/03_webdriver.jsp
See this answer for an example of what you are trying to do:
Using Selenium Web Driver to retrieve value of a HTML input
JavaScript: Parse Java source code, extract method
The AST is just another JSON object. Try jsonpath
.
npm install jsonpath
To extract all methods, just filter on condition node=="MethodDeclaration"
:
var jp = require('jsonpath');
var methods = jp.query(ast, '$.types..[?(@.node=="MethodDeclaration")]');
console.log(methods);
See here for more JSON path syntax.
Java parser written in JavaScript
Have a look at ANTLR which can have Javascript as a target, with the Java 1.5 grammar at http://www.antlr.org/grammar/1152141644268/Java.g
Edit: link stopped working - try https://github.com/antlr/grammars-v4/blob/master/java/Java.g4 :)
javascript parser in java
Since you are already parsing your HTML using JSoup, your next step is to traverse each element to check if they contain Javascript. Something like this code will check each element:
boolean validateHtml(String html) {
Document doc = Jsoup.parse(html);
for(Element e : doc.getAllElements()) {
if(detectJavascript(e)) {
return false;
}
}
return true;
}
private boolean detectJavascript(Element e) {
if(/* Check if element contains javascript */) {
return true;
}
return false;
}
Then, there are several checks you should perform inside detectJavacript
function:
- Of course, reject
script
elements:e.normalName() == "script"
- Reject elements with a value in any
on*
attribute (onload
,onclick
, etc). You have the complete list here but it's probably just enough to get all attributes withe.attributes()
and reject if any of them starts with"on"
. - Every attribute that accepts a URL (
href
,src
, etc.) can contain a"javascript:"
value that executes JavaScript. You should check all those too. For a complete (?) list of these attributes, check this other SO question.
Finally, I advise not to store the original html into the database, even if it passes your validation. Instead convert the document parsed by JSoup again to html. This way you make sure you have a well-formed document free of any "dangerous" elements.
Parse JavaScript with Java
import java.util.regex.*;
Pattern p1 = Pattern.compile("X-MOON-EXPIRED', \"([^\"]*)\"");
Pattern p2 = Pattern.compile("X-MOON-TOKEN', \"([^\"]*)\"");
String html = "<script type=\"text/javascript\"> $(function() { $.ajaxSetup({ beforeSend: function(xhr) { xhr.setRequestHeader('X-MOON-EXPIRED', \"1445350653\"); xhr.setRequestHeader('X-MOON-TOKEN', \"10dafe974cc156d2d3b7fd9bb1e4e3ed\"); } }); }); </script>";
Matcher m1 = p1.matcher(html);
Matcher m2 = p2.matcher(html);
if (!m1.find() || !m2.find()) {
throw new Exception("Didn't match");
}
System.out.println(String.format("X-MOON-EXPIRED=%s, X-MOON-TOKEN=%s", m1.group(1), m2.group(1)));
Prints:
X-MOON-EXPIRED=1445350653 X-MOON-TOKEN=10dafe974cc156d2d3b7fd9bb1e4e3ed
Using Nashorn to parse JavaScript into a syntax tree
tree.getSourceElements()
gives you a list of elements of type Tree
which has the method getKind()
that gives you the Tree.Kind
of the element:
Parser parser = Parser.create();
CompilationUnitTree tree = parser.parse(file, new InputStreamReader(stream), null);
for (Tree tree : tree.getSourceElements()) {
System.out.println(tree.getKind());
switch(tree.getKind()) {
case FUNCTION:
[...]
}
}
If you want to run down the AST you can then implement the interface TreeVisitor<R,D>
to visit the nodes:
Parser parser = Parser.create();
CompilationUnitTree tree = parser.parse(file, new InputStreamReader(stream), null);
if (tree != null) {
tree.accept(new BasicTreeVisitor<Void, Void>() {
public Void visitFunctionCall(FunctionCallTree functionCallTree, Void v) {
System.out.println("Found a functionCall: " + functionCallTree.getFunctionSelect().getKind());
return null;
}
}, null);
}
Related Topics
Java.Lang.Noclassdeffounderror: Com.Google.Ads.Adview
Serializing and De-Serializing Android.Graphics.Bitmap in Java
Urlconnection Getcontentlength() Is Returning a Negative Value
Efficiency of Searching Using Wherearraycontains
Noclassdeffounderror on External Library Project for Android
Google Cloud Messaging - Messages Either Received Instantly or with Long Delay
Connectiontimeout Versus Sockettimeout
What Is the Simplest Way to Reverse an Arraylist
Convert a String to a Byte Array and Then Back to the Original String
Xmpp with Java Asmack Library Supporting X-Facebook-Platform
Google Gson Linkedtreemap Class Cast to Myclass
Passing Function as a Parameter in Java
Passing a String by Reference in Java
Running a .SQL Script Using MySQL with Jdbc