How to download a file over HTTP?
Use urllib.request.urlopen()
:
import urllib.request
with urllib.request.urlopen('http://www.example.com/') as f:
html = f.read().decode('utf-8')
This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.
On Python 2, the method is in urllib2
:
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
How Browser download files (via HTTP or FTP)
It depends on the url :
ftp://www.example.com/bla/bla/bla01.zip
will be fetched via ftp, and
http://www.example.com/bla/bla/bla01.zip
will be fetched via http
Of course we cannot simply change http:// with ftp:// as http need an http server, and ftp need an ftp server.
How to download a file from http using C?
This is an update for the previous posted code. The http protocol is far to be implementation in just a small example.
reformatting the code , or giving a modification to it is more than welcome.
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <errno.h>
#include <arpa/inet.h>
#include <string.h>
int ReadHttpStatus(int sock){
char c;
char buff[1024]="",*ptr=buff+1;
int bytes_received, status;
printf("Begin Response ..\n");
while(bytes_received = recv(sock, ptr, 1, 0)){
if(bytes_received==-1){
perror("ReadHttpStatus");
exit(1);
}
if((ptr[-1]=='\r') && (*ptr=='\n' )) break;
ptr++;
}
*ptr=0;
ptr=buff+1;
sscanf(ptr,"%*s %d ", &status);
printf("%s\n",ptr);
printf("status=%d\n",status);
printf("End Response ..\n");
return (bytes_received>0)?status:0;
}
//the only filed that it parsed is 'Content-Length'
int ParseHeader(int sock){
char c;
char buff[1024]="",*ptr=buff+4;
int bytes_received, status;
printf("Begin HEADER ..\n");
while(bytes_received = recv(sock, ptr, 1, 0)){
if(bytes_received==-1){
perror("Parse Header");
exit(1);
}
if(
(ptr[-3]=='\r') && (ptr[-2]=='\n' ) &&
(ptr[-1]=='\r') && (*ptr=='\n' )
) break;
ptr++;
}
*ptr=0;
ptr=buff+4;
//printf("%s",ptr);
if(bytes_received){
ptr=strstr(ptr,"Content-Length:");
if(ptr){
sscanf(ptr,"%*s %d",&bytes_received);
}else
bytes_received=-1; //unknown size
printf("Content-Length: %d\n",bytes_received);
}
printf("End HEADER ..\n");
return bytes_received ;
}
int main(void){
char domain[] = "sstatic.net", path[]="stackexchange/img/logos/so/so-logo-med.png";
int sock, bytes_received;
char send_data[1024],recv_data[1024], *p;
struct sockaddr_in server_addr;
struct hostent *he;
he = gethostbyname(domain);
if (he == NULL){
herror("gethostbyname");
exit(1);
}
if ((sock = socket(AF_INET, SOCK_STREAM, 0))== -1){
perror("Socket");
exit(1);
}
server_addr.sin_family = AF_INET;
server_addr.sin_port = htons(80);
server_addr.sin_addr = *((struct in_addr *)he->h_addr);
bzero(&(server_addr.sin_zero),8);
printf("Connecting ...\n");
if (connect(sock, (struct sockaddr *)&server_addr,sizeof(struct sockaddr)) == -1){
perror("Connect");
exit(1);
}
printf("Sending data ...\n");
snprintf(send_data, sizeof(send_data), "GET /%s HTTP/1.1\r\nHost: %s\r\n\r\n", path, domain);
if(send(sock, send_data, strlen(send_data), 0)==-1){
perror("send");
exit(2);
}
printf("Data sent.\n");
//fp=fopen("received_file","wb");
printf("Recieving data...\n\n");
int contentlengh;
if(ReadHttpStatus(sock) && (contentlengh=ParseHeader(sock))){
int bytes=0;
FILE* fd=fopen("test.png","wb");
printf("Saving data...\n\n");
while(bytes_received = recv(sock, recv_data, 1024, 0)){
if(bytes_received==-1){
perror("recieve");
exit(3);
}
fwrite(recv_data,1,bytes_received,fd);
bytes+=bytes_received;
printf("Bytes recieved: %d from %d\n",bytes,contentlengh);
if(bytes==contentlengh)
break;
}
fclose(fd);
}
close(sock);
printf("\n\nDone.\n\n");
return 0;
}
How to download a file from http url?
This is what I did:
wget -O file.tar "http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE46130&format=file"
How do you download files over HTTP with Vala?
Yes, this can be done more easily with gio-2.0. Just open first file by URL, the second file locally, and copy the first one to the second. The following example downloads the code of this html page.
void main () {
var file_from_http = File.new_for_uri ("https://stackoverflow.com/questions/61021171/how-do-you-download-files-over-http-with-vala");
File local_file = File.new_for_path("./stackoverflow.html");
file_from_http.copy(local_file, FileCopyFlags.OVERWRITE);
}
How to download a file that gets processed over an HTTP request in Java?
Solved! I used a different API and it works perfectly.
https://github.com/YtoTech/latex-on-http
Download file from web in Python 3
If you want to obtain the contents of a web page into a variable, just read
the response of urllib.request.urlopen
:
import urllib.request
...
url = 'http://example.com/'
response = urllib.request.urlopen(url)
data = response.read() # a `bytes` object
text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
The easiest way to download and save a file is to use the urllib.request.urlretrieve
function:
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
urllib.request.urlretrieve(url, file_name)
import urllib.request
...
# Download the file from `url`, save it in a temporary directory and get the
# path to it (e.g. '/tmp/tmpb48zma.txt') in the `file_name` variable:
file_name, headers = urllib.request.urlretrieve(url)
But keep in mind that urlretrieve
is considered legacy and might become deprecated (not sure why, though).
So the most correct way to do this would be to use the urllib.request.urlopen
function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj
.
import urllib.request
import shutil
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
If this seems too complicated, you may want to go simpler and store the whole download in a bytes
object and then write it to a file. But this works well only for small files.
import urllib.request
...
# Download the file from `url` and save it locally under `file_name`:
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
data = response.read() # a `bytes` object
out_file.write(data)
It is possible to extract .gz
(and maybe other formats) compressed data on the fly, but such an operation probably requires the HTTP server to support random access to the file.
import urllib.request
import gzip
...
# Read the first 64 bytes of the file inside the .gz archive located at `url`
url = 'http://example.com/something.gz'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read(64) # a `bytes` object
# Or do anything shown above using `uncompressed` instead of `response`.
Basic http file downloading and saving to disk in python?
A clean way to download a file is:
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")
This downloads a file from a website and names it file.gz
. This is one of my favorite solutions, from Downloading a picture via urllib and python.
This example uses the urllib
library, and it will directly retrieve the file form a source.
Related Topics
How to Add Value Labels on a Bar Chart
Why Do Backslashes Appear Twice
How to Change the Size of Figures Drawn With Matplotlib
How to Copy a Dictionary and Only Edit the Copy
Are a Wsgi Server and Http Server Required to Serve a Flask App
Difference Between @Staticmethod and @Classmethod
Return Json Response from Flask View
How to Import a Module Given Its Name as String
Sorting List Based on Values from Another List
What's the Canonical Way to Check For Type in Python
How to Concatenate Str and Int Objects
Why Does Python Use 'Else' After For and While Loops
Why Can a Function Modify Some Arguments as Perceived by the Caller, But Not Others
Open() in Python Does Not Create a File If It Doesn't Exist
How to Flush the Output of the Print Function
What Do I Use on Linux to Make a Python Program Executable
Why Do I Get Attributeerror: 'Nonetype' Object Has No Attribute 'Something'