JavaScript Parser in Python

JavaScript parser in Python

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

The ANTLR site provides many grammars, including one for JavaScript.

As it happens, there is a Python API available - so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).

Parsing Javascript In Python

You can extract JSON from arbitrary text with the jsonfinder library:

from jsonfinder import jsonfinder
import requests

scrape_url = "https://swishanalytics.com/optimus/nba/daily-fantasy-projections?date=2016-12-15"
content = requests.get(scrape_url).text
for _, __, obj in jsonfinder(content, json_only=True):
if (obj and
isinstance(obj, list) and
isinstance(obj[0], dict) and
{'player_id', 'event_id', 'name'}.issubset(obj[0])
):
break
else:
raise ValueError('data not found')

# Now you can use obj
print(len(obj))
print(obj[0])

Python Parsing Javascript with beautifulsoup

from bs4 import BeautifulSoup as bs
import requests
import re
url = 'https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=13&rver=6.7.6643.0&wp=MBI_SSL&wreply=https:%2f%2faccount.xbox.com%2fen-us%2faccountcreation%3freturnUrl%3dhttps:%252f%252fwww.xbox.com:443%252fen-US%252f%26pcexp%3dtrue%26uictx%3dme%26rtc%3d1&lc=1033&id=292543&aadredir=1'
page = requests.get(url)
html = bs(page.text, 'lxml')
input = html.findAll('script', type="text/javascript")[5].prettify()
value = re.findall(r'value=".+"/', input)
#value = str(value).replace('value="', '').replace('"/','')
value = str(value).replace('value="', '').replace('"/','').replace("['",'').replace("']",'')
print(value)
Output:
DVSXQahhtomXS2Y4k2itS5MPP52mJgUkC7LH!W*1DmjHiWk*npajBfgXK5yp3*!bu3Wuvvs7xavleUV3nIbjLZHckj73QMe8wipwXhCqpXuUZQ2wnJvNYAVNCg9XxKPuIovp7!sLbumrufuYefyzM6UQLkMb5c7MuImDofVhLlKxpI7Pohe8sO2x8r63TtFCTDphWzqXKJE3B8DRK*AhMbFsmdP0sj2CXMZ7dyTfLJSr1zWBlaHTqJPLvhgzLSiaEg$$

Python library for parsing code of any language into an AST?

In general, when you need to parse code written in a language, it’s almost always better to use that language instead.

For parsing JavaScript from Python, you may want to check out this module, which can be installed using pip and should work well enough.

Web parser in Javascript like BeautifulSoup in Python

In a browser context, you can use DOMParser:

const html = "<h1>title</h1>";
const parser = new DOMParser();
const parsed = parser.parseFromString(html, "text/html");
console.log(parsed.firstChild.innerText); // "title"

and in node you can use node-html-parser:

import { parse } from 'node-html-parser';

const html = "<h1>title</h1>";
const parsed = parse(html);
console.log(parsed.firstChild.innerText); // "title"

Parsing Javascript with Python

I think these methods are essentially the same in terms of elegance and performance (using {.*} may be slightly better because .* is greedy, i.e. there will be almost no backtracking, and because it seems to me more "forgiving" for different JS code formatting nuances). What you may be more interested in is this: https://docs.python.org/3.6/library/json.html.

Parsing javascript data structure in python

demjson.decode()

import demjson

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = demjson.decode(js_obj)

jsonnet.evaluate_snippet()

import json, _jsonnet

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))

ast.literal_eval()

import ast

# from
js_obj = "{'x':1, 'y':2, 'z':3}"

# to
py_obj = ast.literal_eval(js_obj)


Related Topics



Leave a reply



Submit