Parsing HTML to Get Script Variable Value

node.js parsing html text to get a value to a javascript variable

So you're not re-inventing the wheel, I feel like using JSDOM (and it's execution capabilities) would be the best best. To mock what you have:

const express   = require('express');
const jsdom = require("jsdom");
const { JSDOM } = jsdom; // it exports a JSDOM class

// Mock a remote resource
const remote = express()
.use('/', (req, res) => {
res.send('<!DOCTYPE html><html lang="en-US"><head><title>Test document</title><script>var x1 = { "p": { "foo": "bar" } };</script></head><body></body></html>');
})
.listen(3001);

// Create "your" server
const local = express()
.use('/', (req, res) => {
// fetch the remote resource and load it into JSDOM. No need for
// requestify, but you can use the JSDOM ctor and pass it a string
// if you're doing something more complex than hitting an endpoint
// (like passing auth, creds, etc.)
JSDOM.fromURL('http://localhost:3001/', {
runScripts: "dangerously" // allow <script> to run
}).then((dom) => {
// pass back the result of "x1" from the context of the
// loaded dom page.
res.send(dom.window.x1);
});
})
.listen(3000);

I then receive back:

{"p":{"foo":"bar"}}

Parsing HTML to get script variable value

Very simple example of how this could be easy using a HTMLAgilityPack and Jurassic library to evaluate the result:

var html = @"<html>
// Some HTML
<script>
var spect = [['temper', 'init', []],
['fw\/lib', 'init', [{staticRoot: '//site.com/js/'}]],
[""cap"",""dm"",[{""tackmod"":""profile"",""xMod"":""timed""}]]];
</script>
// More HTML
</html>";

// Grab the content of the first script element
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var script = doc.DocumentNode.Descendants()
.Where(n => n.Name == "script")
.First().InnerText;

// Return the data of spect and stringify it into a proper JSON object
var engine = new Jurassic.ScriptEngine();
var result = engine.Evaluate("(function() { " + script + " return spect; })()");
var json = JSONObject.Stringify(engine, result);

Console.WriteLine(json);
Console.ReadKey();

Output:

[["temper","init",[]],["fw/lib","init",[{"staticRoot":"//site.com/js/"}]],["cap","dm",[{"tackmod":"profile","xMod":"timed"}]]]

Note: I am not accounting for errors or anything else, this merely serves as an example of how to grab the script and evaluate for the value of spect.

There are a few other libraries for executing/evaluating JavaScript as well.

PHP/Querypath get value of Javascript variable

You can use a regular expression. This code will return the room id in your example.

<?php

$html = '
<script type="text/javascript">
var base_url = "http://www.exampleurl.com/";
var room_id = "357"; //I want to get the value of room_id
var selected_room_button = "";
</script>';

$pattern = '/var room_id = "(.*)";/';
preg_match($pattern, $html, $matches);
$room_id = $matches[1];

But there is no general solution as a variable may have been defined twice, or have been defined in different scopes.

If you don't need to extract other content beside the row_id I would see no reason for using a HTML parser. It would just slow down things. Also please expect the HTML parser not being a Javascript parser! The HTML parser would just being used to extract the unparsed content between <script> </script> tags - as a string. You would need a regex again to extract the row_id.

How to parse html stored as variable inside javascript

You could wrap it in a container and map it to join it. Well, that would give:

var parse= 'Hello<i class="emoji emoji_smile" title=":smile:"></i><i class="emoji emoji_angry" title=":angry:"></i>World';
var parsed = $('<div/>', {html:parse}).contents().map(function(){ return this.title || this.nodeValue;}).get().join('');
$('body').append(parsed);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>

Parsing variable data out of a javascript tag using python

If you use BeautifulSoup to get the contents of the <script> tag, the json module can do the rest with a bit of string magic:

 jsonValue = '{%s}' % (textValue.partition('{')[2].rpartition('}')[0],)
value = json.loads(jsonValue)

The .partition() and .rpartition() combo above split the text on the first { and on the last } in the JavaScript text block, which should be your object definition. By adding the braces back to the text we can feed it to json.loads() and get a python structure from it.

This works because JSON is basically the Javascript literal syntax objects, arrays, numbers, booleans and nulls.

Demonstration:

>>> import json
>>> text = '''
... var page_data = {
... "default_sku" : "SKU12345",
... "get_together" : {
... "imageLargeURL" : "http://null.null/pictures/large.jpg",
... "URL" : "http://null.null/index.tmpl",
... "name" : "Paints",
... "description" : "Here is a description and it works pretty well",
... "canFavorite" : 1,
... "id" : 1234,
... "type" : 2,
... "category" : "faded",
... "imageThumbnailURL" : "http://null.null/small9.jpg"
... }
... };
... '''
>>> json_text = '{%s}' % (text.partition('{')[2].rpartition('}')[0],)
>>> value = json.loads(json_text)
>>> value
{'default_sku': 'SKU12345', 'get_together': {'imageLargeURL': 'http://null.null/pictures/large.jpg', 'URL': 'http://null.null/index.tmpl', 'name': 'Paints', 'description': 'Here is a description and it works pretty well', 'canFavorite': 1, 'id': 1234, 'type': 2, 'category': 'faded', 'imageThumbnailURL': 'http://null.null/small9.jpg'}}
>>> import pprint
>>> pprint.pprint(value)
{'default_sku': 'SKU12345',
'get_together': {'URL': 'http://null.null/index.tmpl',
'canFavorite': 1,
'category': 'faded',
'description': 'Here is a description and it works pretty '
'well',
'id': 1234,
'imageLargeURL': 'http://null.null/pictures/large.jpg',
'imageThumbnailURL': 'http://null.null/small9.jpg',
'name': 'Paints',
'type': 2}}

How to Get Script Tag Variables From a Website using Python

Without having the full code to get that output, I'm guessing a bit here. But If you can grab the text, then just use json, you should be able to get that data.

So I'll use an example of one of your previous questions, that essentially has this same format:

There's really nothing different, except we're going to extract the part of the string that can utilize json.loads(). Then you have a nice json type of dictionaries and lists that you can extract the id's of the product:

import requests
import bs4
import json

url = 'https://packershoes.com/products/copy-of-adidas-predator-accelerator-trainer'
r = requests.get(url)

bs = bs4.BeautifulSoup(r.text, "html.parser")
scripts = bs.find_all('script')
jsonObj = None

for s in scripts:
if 'var meta' in s.text:
script = s.text
script = script.split('var meta = ')[1]
script = script.split(';\nfor (var attr in meta)')[0]

jsonStr = script
jsonObj = json.loads(jsonStr)

for value in jsonObj['product']['variants']:
print ('ID: '+ str(value['id']))

Output:

ID: 14189113049177
ID: 14189122912345
ID: 14139452129369
ID: 14139452194905
ID: 14139452227673
ID: 14139452293209
ID: 14139452325977
ID: 14139452391513
ID: 14139452424281
ID: 14189321715801
ID: 14139452457049
ID: 14139909505113

Parsing variable types from Strings in javaScript

Your code throws an error on the types that are null. JSON.parse() only parses valid json strings. Those that it was able to match are all valid types in json.

https://tc39.es/ecma262/#sec-json.parse

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/parse

That said, the json strings will not work:

JSON.parse('"document"') 
// variable string
JSON.parse("\""+ c + "\"")

Sample Image

src: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/eval

var valueB2 =  Function("return " + c)();

Parse value of php variable to HTML input field

echo '<id="element_5" would need to be echo '<input id="element_5". But that will duplicate the box you've already got lower down, and also it would be outside the <html> tag, making it invalid (because it's not part of the HTML document), so the browser is not obliged to show it.

Try this instead:

<?php
$val = null;

if(isset($_POST['submit']))
{
$val = $_POST['element_1'];
}
?>

and on the element further down the page:

<input id="element_5" name="element_5" class="element text medium" type="text" maxlength="255" value="<?php echo ($val != null ? $val : ""); ?>" readonly/>

Note the inline PHP there to echo $val if it's not null.

How to get javascript variable value from HTML using CsQuery

CsQuery only parses HTML - not javascript. So you could easily get ahold of the contents of the script block like this:

CQ dom = @"<script type='text/javascript'>
dealerdata = 'HelloWorld'
</script>";

var script = dom["script"].Text();
// script == "dealerdata = 'HelloWorld'

... but then you're on your own, it's JavaScript. In your example it would be trivial:

string[] parts = script.Split('=');
string value = parts[1].Trim();

.. but this is only because you know exactly what the input looks like. For typical use cases where you're not sure exactly what context your target could be in, that won't help you much.

If you need to parse Javascript in .NET I'd recommend the Jurassic project, an awesome JavaScript compiler. If speed is of utmost importance, look at javascript.net. This wraps Google's V8 engine and will be a lot faster than Jurassic, but will have non-.NET dependencies.



Related Topics



Leave a reply



Submit