Remote file size without downloading file
Found something about this here:
Here's the best way (that I've found) to get the size of a remote
file. Note that HEAD requests don't get the actual body of the request,
they just retrieve the headers. So making a HEAD request to a resource
that is 100MB will take the same amount of time as a HEAD request to a
resource that is 1KB.
<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* @param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* @return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;
$curl = curl_init( $url );
// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );
$data = curl_exec( $curl );
curl_close( $curl );
if( $data ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}
// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
Usage:
$file_size = curl_get_file_size( "http://stackoverflow.com/questions/2602612/php-remote-file-size-without-downloading-file" );
How can I get the file size from a link without downloading it in python?
To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.
$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes
The file size is in the 'Content-Length' header. In Python 3.6:
>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887',
method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'
Get file size without downloading it
Found the answer: HEAD Request.
http.Response r = await http.head(url);
r.headers["content-length"]
Note: r.contentLength;
directly doesn't work.
Get remote file size via HTTP without downloading said file
Try to make HEAD request, it returns only headers with Content-Length header included.
~$ curl -I http://google.com/
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Length: 219
...
Find file size of a remote file without downloading it
This probably has to do with GitHub URL being https
. You can tell cURL go ignore the certificate check, by doing:
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
If this doesn't solve the issue, you can add a echo curl_error($curl)
before curl_close( $curl );
-- it might help you with debugging.
How to determine the file size of a remote download without reading the entire file with R
Nowadays a straight-forward approach might be
response = httr::HEAD(url)
httr::headers(response)[["Content-Length"]]
My original answer was: A more 'by hand' approach is to set the CURLOPT_NOBODY option (see man curl_easy_setopt
on Linux, basically inspired by looking at the answers to the linked question) and tell getURL
and friends to return the header along with the request
library(RCurl)
url = "http://stackoverflow.com/questions/20921593/how-to-determine-the-file-size-of-a-remote-download-without-reading-the-entire-f"
xx = getURL(url, nobody=1L, header=1L)
strsplit(xx, "\r\n")
## [[1]]
## [1] "HTTP/1.1 200 OK"
## [2] "Cache-Control: public, max-age=60"
## [3] "Content-Length: 60848"
## [4] "Content-Type: text/html; charset=utf-8"
## [5] "Expires: Sat, 04 Jan 2014 14:09:58 GMT"
## [6] "Last-Modified: Sat, 04 Jan 2014 14:08:58 GMT"
## [7] "Vary: *"
## [8] "X-Frame-Options: SAMEORIGIN"
## [9] "Date: Sat, 04 Jan 2014 14:08:57 GMT"
## [10] ""
A peak at url.exists
suggests parseHTTPHeader(xx)
for parsing HTTP headers. getURL
also works with ftp URLs.
url = "ftp://ftp2.census.gov/AHS/AHS_2004/AHS_2004_Metro_PUF_Flat.zip"
getURL(url, nobody=1L, header=1L)
## [1] "Content-Length: 21288307\r\nAccept-ranges: bytes\r\n"
how to get file size without downloading it in python requests
Instead of using GET
request, do HEAD
request:
resp = requests.request('HEAD', "https://Whereever.user.wants.com/THEFILE.zip")
The HTTP HEAD
method requests the headers that would be returned if the HEAD
request's URL was instead requested with the HTTP GET
method. In your case, where URL produces a large download, a HEAD
request would read its Content-Length
header to get the filesize without actually downloading the file.
Get file size of remote url without downloading in Google app engine(php)
Just use http streams API
function csize($url) {
$options = ['http' => [
'method' => 'HEAD',
],
];
$ctx = stream_context_create($options);
$result = file_get_contents($url, false, $ctx);
if ($result !== false) {
foreach($http_response_header as $header) {
if (preg_match("/Content-Length: (\d+)/i", $header, $matches)) {
return $matches[1];
}
}
}
}
Get size of a remote file
If the site redirects via. the Location header you can use:
// get the redirect url
$headers = get_headers("http://somedomain.com/files/34", 1);
$redirectUrl = $headers['Location'];
// get the filesize
$headers = get_headers($redirectUrl, 1);
$filesize = $headers["Content-Length"];
Please note that this code should not be used in production as there are no checks for existing array keys or error handling.
Get remote file size from HTTPS url
You need to add the below cURL
parameter to your existing set.
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0);
- Adding this will stop
cURL
from verifying the peer's certificate.
Related Topics
How to Validate an Email Address in PHP
How to Check If an Email Address Exists Without Sending an Email
How to Get a Variable Name as a String in PHP
PHP Function to Generate V4 Uuid
PHP Sessions Across Sub Domains
Why Can't I Access Datetime-≫Date in PHP'S Datetime Class
Automated or Regular Backup of MySQL Data
How to Convert a Pdf Document to a Preview Image in PHP
How to Access Object Properties With Names Like Integers or Invalid Property Names
How to Get File_Get_Contents() to Work With Https
Open_Basedir Restriction in Effect. File(/) Is Not Within the Allowed Path(S):
Creating Default Object from Empty Value in PHP
Compiling an Ast Back to Source Code
Redirect All to Index.PHP Using Htaccess