Shell Tool Which Renders Web Site Including JavaScript

get a browser rendered html+javascript

Try phantomjs from www.phantomjs.org and you can easily modify the included rasterize.js to export the rendered HTML. It's based on webkit and does full evaluation of your target site's javascript, allowing you to adjust timeouts or execute your own code first if you wish. I personally use it to save hardcopy HTML file version of fully-rendered knockout.js templates.

It executes javascript so I just did something like this and saved the console output to a file:

var markup = page.evaluate(function(){return document.documentElement.innerHTML;});
console.log(markup);
phantom.exit();

Command line based HTTP POST to retrieve data from javascript-rich webpage

You can use WebDriver to do, just that you need have web browser installed. There are other solution as well such as Selenium and HtmlUnit (without browser but might behave differently).

You can find example of Selenium project at here.

WebDriver

WebDriver is a tool for writing automated tests of websites. It aims
to mimic the behaviour of a real user, and as such interacts with the
HTML of the application.

Selenium

Selenium automates browsers. That's it. What you do with that power is
entirely up to you. Primarily it is for automating web applications
for testing purposes, but is certainly not limited to just that.
Boring web-based administration tasks can (and should!) also be
automated as well.

HtmlUnit

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML
documents and provides an API that allows you to invoke pages, fill
out forms, click links, etc... just like you do in your "normal"
browser.

I would recommend use WebDriver because it is not required standalone server like Selenium, while for HtmlUnit might suitable if you dont want install browser without worry about Xvfb in headless environment.

Is there a tool to analyze which javascript file added a certain portion of html / code?

No, there is not a tool to do such a thing. Understanding the code yourself or searching for specific key phrases in the HTML you're trying to source (such as a class name or tag name or piece of text) is the typical method.

It could work to grep for the common ways that the DOM is modified (.innerHTML property, .appendChild(), .insertBefore, etc... if it's plain javascript) or similar methods in whatever library is being used.

Get javascript rendered html source using phantomjs

Unfortunately, that is not possible using just the PhantomJS command line. You have to use a Javascript file to actually accomplish anything with PhantomJS.

Here is a very simple version of the script you can use

Code mostly copied from https://stackoverflow.com/a/12469284/4499924

printSource.js

var system = require('system');
var page = require('webpage').create();
// system.args[0] is the filename, so system.args[1] is the first real argument
var url = system.args[1];
// render the page, and run the callback function
page.open(url, function () {
// page.content is the source
console.log(page.content);
// need to call phantom.exit() to prevent from hanging
phantom.exit();
});

To print the page source to standard out.

phantomjs printSource.js http://todomvc.com/examples/emberjs/

To save the page source in a file

phantomjs printSource.js http://todomvc.com/examples/emberjs/ > ember.html

How to download a website where javascript code lookup results are included?

Phantom.js?

http://phantomjs.org/quick-start.html

I think this will do what you like!

The best thing to do is install from here:

http://phantomjs.org/

Basically you run it by creating javascript scripts and passing as a command line arg, e.g.

phantomjs.exe someScript.js

There are loads of examples, you can render a website as an image,
for example you can do:

phantomjs.exe github.js

Where github.js looks like

var page = require('webpage').create();
page.open('http://github.com/', function() {
page.render('github.png');
phantom.exit();
});

This demo is at
http://phantomjs.org/screen-capture.html

You can also show the webpage content as text.

For example, let's take a simple webpage, demo_page.html:

<html>
<head>
<script>
function setParagraphText() {
document.getElementById("1").innerHTML = "42 is the answer.";
}
</script>
</head>
<body onload="setParagraphText();">
<p id="1">Static content</p>
<body>
</html>

And then create a test script, test.js:

var page = require('webpage').create();

page.open("demo_page.html", function(status) {
console.log("Status: " + status);
if(status === "success") {
console.log('Page text' + page.plainText);
console.log('All done');
}
phantom.exit();
});

Then in the console write:

> phantomjs.exe test.js
Status: success
Page text: 42 is the answer.
All done

You can also inspect the page DOM and even update it:

var page = require('webpage').create();

page.open("demo_page.html", function(status) {
console.log("Status: " + status);
if(status === "success") {
page.evaluate(function(){
document.getElementById("1").innerHTML = "I updated the value myself";
});

console.log('Page text: ' + page.plainText);
console.log('All done');
}
phantom.exit();
});

Saving each .html page in a webpage by searching recursively with either a script or a tool?

I guess this is you're solution : https://www.httrack.com/page/1/en/index.html

with a tutorial : http://www.wikihow.com/Copy-a-Website

Get HTML page with javascript elements in bash script

Thank you for your response. Unfortunatelly nothing worked for me... I'm on a Raspberry without gui and tried it with chromium and firefox. It seems that firefox does not even have something like a DOM-dump function. And chromium keeps crashing or at least not doing something that might help, like:

$ chromium-browser --headless --dump-dom --disable-gpu --print-to-pdf "http://192.168.8.1/html/statistic.html"
[0110/214637.782914:ERROR:browser_main_loop.cc(596)] Failed to put Xlib into threaded mode.
[0110/214640.321288:FATAL:gpu_data_manager_impl_private.cc(897)] The display compositor is frequently crashing. Goodbye.
Trace/Breakpoint ausgelöst

or

$ DISPLAY=:0 chromium-browser --headless --dump-dom --disable-gpu "http://192.168.8.1/html/statistic.html"
X Error: BadDrawable
Request Major code 55 ()
ResourceID 0x0
Error Serial #144
Current Serial #146
X Error: BadDrawable
Request Major code 55 ()
ResourceID 0x0
Error Serial #144
Current Serial #146
X Error: BadDrawable
Request Major code 55 ()
ResourceID 0x0
Error Serial #144
Current Serial #146
[0110/214716.189128:FATAL:gpu_data_manager_impl_private.cc(897)] The display compositor is frequently crashing. Goodbye.
[0110/214716.198871:ERROR:broker_posix.cc(40)] Recvmsg error: Die Verbindung wurde vom Kommunikationspartner zurückgesetzt (104)
Trace/Breakpoint ausgelöst

I also read about browsh and tried that. But this one is also extremely unstable, keeps crashing durnig startup and only loaded my page once. But as I was not able to find anything to output this in a machine readable format, I lookes around a bit more.

And I actually found something, that looks really nice for me. I'm trying to read out some traffic statistics of a LTE stick (Huawei E3531). I found a lot of values accessible through an API in the form of xml-files. Theses can be found on these URLs (192.168.8.1 is the IP of the network interface, that is provided by the LTE stick)

http://192.168.8.1/api/monitoring/month_statistics
http://192.168.8.1/api/monitoring/traffic-statistics
http://192.168.8.1/api/monitoring/status
http://192.168.8.1/api/device/basic_information
http://192.168.8.1/api/online-update/configuration
http://192.168.8.1/api/monitoring/converged-status
http://192.168.8.1/api/pin/status
http://192.168.8.1/api/monitoring/start_date

The month_statistics-page looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<CurrentMonthDownload>166033238</CurrentMonthDownload>
<CurrentMonthUpload>9679896</CurrentMonthUpload>
<MonthDuration>26391</MonthDuration>
<MonthLastClearTime>2019-12-30</MonthLastClearTime>
</response>

So I would assume that the CurrentMonthDownload-value is the used volume of this month. On the actual website it shows 167.57 MB. I'm still not 100% sure how this is calculated, but it should be accurate enough for me.



Related Topics



Leave a reply



Submit