JavaScript parser in Python
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
The ANTLR site provides many grammars, including one for JavaScript.
As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).
Parsing Javascript In Python
You can extract JSON from arbitrary text with the jsonfinder library:
from jsonfinder import jsonfinder
import requests
scrape_url = "https://swishanalytics.com/optimus/nba/daily-fantasy-projections?date=2016-12-15"
content = requests.get(scrape_url).text
for _, __, obj in jsonfinder(content, json_only=True):
if (obj and
isinstance(obj, list) and
isinstance(obj[0], dict) and
{'player_id', 'event_id', 'name'}.issubset(obj[0])
):
break
else:
raise ValueError('data not found')
# Now you can use obj
print(len(obj))
print(obj[0])
Python Parsing Javascript with beautifulsoup
from bs4 import BeautifulSoup as bs
import requests
import re
url = 'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
page = requests.get(url)
html = bs(page.text, 'lxml')
input = html.findAll('script', type="text/javascript")[5].prettify()
value = re.findall(r'value=".+"/', input)
#value = str(value).replace('value="', '').replace('"/','')
value = str(value).replace('value="', '').replace('"/','').replace("['",'').replace("']",'')
print(value)
Output:
DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWk*npajBfgXK5yp3*!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$
Python library for parsing code of any language into an AST?
In general, when you need to parse code written in a language, it’s almost always better to use that language instead.
For parsing JavaScript from Python, you may want to check out this module, which can be installed using pip and should work well enough.
Web parser in Javascript like BeautifulSoup in Python
In a browser context, you can use DOMParser:
const html = "<h1>title</h1>";
const parser = new DOMParser();
const parsed = parser.parseFromString(html, "text/html");
console.log(parsed.firstChild.innerText); // "title"
and in node you can use node-html-parser:
import { parse } from 'node-html-parser';
const html = "<h1>title</h1>";
const parsed = parse(html);
console.log(parsed.firstChild.innerText); // "title"
Parsing Javascript with Python
I think these methods are essentially the same in terms of elegance and performance (using {.*}
may be slightly better because .*
is greedy, i.e. there will be almost no backtracking, and because it seems to me more "forgiving" for different JS code formatting nuances). What you may be more interested in is this: https://docs.python.org/3.6/library/json.html.
Parsing javascript data structure in python
demjson.decode()
import demjson
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = demjson.decode(js_obj)
jsonnet.evaluate_snippet()
import json, _jsonnet
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))
ast.literal_eval()
import ast
# from
js_obj = "{'x':1, 'y':2, 'z':3}"
# to
py_obj = ast.literal_eval(js_obj)
Related Topics
What's the Difference Between Console.Dir and Console.Log
This' Different Between Repl and Script
Rake Db:Create - Could Not Find a JavaScript Runtime
How to Make Cross-Domain Ajax Calls to Google Maps API
Set Default Value of JavaScript Object Attributes
How to Convert String into Float in JavaScript
$.Deferred: How to Detect When Every Promise Has Been Executed
What Are "Top Level JSON Arrays" and Why Are They a Security Risk
IE8 Var W= Window.Open() - "Message: Invalid Argument."
Assign Console.Log Value to a Variable
Javascript:For Loop with Timeout
JavaScript Getter for All Properties
How to Get Objects Value If Its Name Contains Dots
Download a Div in a HTML Page as PDF Using JavaScript
How to Get Selector from Jquery Object
How Does JavaScript Determine the Number of Digits to Produce When Formatting Floating-Point Values