Using Wget via Python

Python equivalent of a given wget command

urllib.request should work.
Just set it up in a while(not done) loop, check if a localfile already exists, if it does send a GET with a RANGE header, specifying how far you got in downloading the localfile.
Be sure to use read() to append to the localfile until an error occurs.

This is also potentially a duplicate of Python urllib2 resume download doesn't work when network reconnects

Download in batch with wget and modify files with a python script immediately after download

you can run a for loop over the input file and for each file run wget -O $new_file_name $url

try something like this -

bash

for url in $(cat envidatS3paths.txt); do wget -O $(echo $url | sed "s/\//_/g").out $url  ; done

python

for url in opened_file:
subprocess.Popen(f'wget -O {url.rsplit('\')[1]} {url}')

Download and run a WGET file

These are WGED files

No, they are .sh files, which are text files, if you open one in text editor you will see that first line

#!/bin/bash

meaning that said file is supposed to be used with bash, moreover following comment might be found

# first be sure it's bash... anything out of bash or sh will break

thus implying you need functional bash in order to make any use of said file.

Python wget saves a file. how to get data in variable

You don't need to use wget to download the HTML to a file then read it in, you can just get the HTML directly. This is using requests (way better than pythons urllibs in my opinion)

import requests
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = requests.get(url).text
print html

This is an example using pythons built in urllib2:

import urllib2
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

html = urllib2.urlopen(url).read()
print html

Edit

I know see what you mean in the difference between HTML gotten directly from the website vs the HTML gotten from the wget module. Here is how you would do it using the wget module:

import wget
from bs4 import BeautifulSoup
url = "https://www.facebook.com/hellomeets/events"

down = wget.download(url)

f = open(down, 'r')
htmlText = "\n".join(f.readlines())
f.close()
print htmlText


Related Topics



Leave a reply



Submit