Wget + JavaScript

wget + JavaScript?

You could probably make that happen with something like PhantomJS

You can write a phantomjs script that will load the page like a browser would, and then either take screenshots or use JS to inspect the page and pull out data.

Can I wget the result from the javascript generated webpage

No! You cannot wget (or even curl) the dynamically generated javascript result from the page. You need a webdriver like Selenium for that or maybe use Chrome in Headless Mode.

But for that particular page (and more specifically for that particular text result), you can use curl to get the text-link:

curl -X POST -d '{"channelno":"099","deviceId":"0000anonymous_user","format":"HLS"}' https://api.viu.now.com/p8/1/getLiveURL | jq '.asset.hls.adaptive[0]'

Note: The POST data and link is taken from the page's source. jq is a nice, little command line utility to handle JSON data on command line.

How to enable 'wget' to download the whole content of HTML with Javascript

You need to put the link inside quotes:

 wget -O downdloadedtext.txt  'http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/av.cgi?db=mouse&c=gene&a=fiche&l=2610008E11Rik'

This is because the & has a special meaning and will split the command into multiple commands.

What is the equivalent of wget in javascript to download a file from a given url?

After a exploring more than a month, with a help of my friend, we were able to find out the following.

The website where the file is hosted is not allowing us to download the file using window.location = url; or window.open(url);

Finally we had to use the data-downloadurl support from HTML5 as follows

<a href="<url-goes-here>" data-downloadurl="audio/mpeg:<filename-goes-here>:<url-goes-here>" download="<filename-goes-here>">Click here to download the file</a>

We embed this html into the host html and when clicked on the link, it triggers the download.

Download a working local copy of a webpage

wget is capable of doing what you are asking. Just try the following:

wget -p -k http://www.example.com/

The -p will get you all the required elements to view the site correctly (css, images, etc).
The -k will change all links (to include those for CSS & images) to allow you to view the page offline as it appeared online.

From the Wget docs:

‘-k’
‘--convert-links’
After the download is complete, convert the links in the document to make them
suitable for local viewing. This affects not only the visible hyperlinks, but
any part of the document that links to external content, such as embedded images,
links to style sheets, hyperlinks to non-html content, etc.

Each link will be changed in one of the two ways:

The links to files that have been downloaded by Wget will be changed to refer
to the file they point to as a relative link.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also
downloaded, then the link in doc.html will be modified to point to
‘../bar/img.gif’. This kind of transformation works reliably for arbitrary
combinations of directories.

The links to files that have not been downloaded by Wget will be changed to
include host name and absolute path of the location they point to.

Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to
../bar/img.gif), then the link in doc.html will be modified to point to
http://hostname/bar/img.gif.

Because of this, local browsing works reliably: if a linked file was downloaded,
the link will refer to its local name; if it was not downloaded, the link will
refer to its full Internet address rather than presenting a broken link. The fact
that the former links are converted to relative links ensures that you can move
the downloaded hierarchy to another directory.

Note that only at the end of the download can Wget know which links have been
downloaded. Because of that, the work done by ‘-k’ will be performed at the end
of all the downloads.


Related Topics



Leave a reply



Submit