Remote File Size Without Downloading File

Remote file size without downloading file

Found something about this here:

Here's the best way (that I've found) to get the size of a remote
file. Note that HEAD requests don't get the actual body of the request,
they just retrieve the headers. So making a HEAD request to a resource
that is 100MB will take the same amount of time as a HEAD request to a
resource that is 1KB.

<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* @param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* @return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;

$curl = curl_init( $url );

// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );

$data = curl_exec( $curl );
curl_close( $curl );

if( $data ) {
$content_length = "unknown";
$status = "unknown";

if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}

if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}

// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}

return $result;
}
?>

Usage:

$file_size = curl_get_file_size( "http://stackoverflow.com/questions/2602612/php-remote-file-size-without-downloading-file" );

How can I get the file size from a link without downloading it in python?

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887',
method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'

Get file size without downloading it

Found the answer: HEAD Request.

http.Response r = await http.head(url);
r.headers["content-length"]

Note: r.contentLength; directly doesn't work.

Get remote file size via HTTP without downloading said file

Try to make HEAD request, it returns only headers with Content-Length header included.

~$ curl -I http://google.com/
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Length: 219
...

Find file size of a remote file without downloading it

This probably has to do with GitHub URL being https. You can tell cURL go ignore the certificate check, by doing:

curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);

If this doesn't solve the issue, you can add a echo curl_error($curl) before curl_close( $curl ); -- it might help you with debugging.

How to determine the file size of a remote download without reading the entire file with R

Nowadays a straight-forward approach might be

response = httr::HEAD(url)
httr::headers(response)[["Content-Length"]]

My original answer was: A more 'by hand' approach is to set the CURLOPT_NOBODY option (see man curl_easy_setopt on Linux, basically inspired by looking at the answers to the linked question) and tell getURL and friends to return the header along with the request

library(RCurl)
url = "http://stackoverflow.com/questions/20921593/how-to-determine-the-file-size-of-a-remote-download-without-reading-the-entire-f"
xx = getURL(url, nobody=1L, header=1L)
strsplit(xx, "\r\n")

## [[1]]
## [1] "HTTP/1.1 200 OK"
## [2] "Cache-Control: public, max-age=60"
## [3] "Content-Length: 60848"
## [4] "Content-Type: text/html; charset=utf-8"
## [5] "Expires: Sat, 04 Jan 2014 14:09:58 GMT"
## [6] "Last-Modified: Sat, 04 Jan 2014 14:08:58 GMT"
## [7] "Vary: *"
## [8] "X-Frame-Options: SAMEORIGIN"
## [9] "Date: Sat, 04 Jan 2014 14:08:57 GMT"
## [10] ""

A peak at url.exists suggests parseHTTPHeader(xx) for parsing HTTP headers. getURL also works with ftp URLs.

url = "ftp://ftp2.census.gov/AHS/AHS_2004/AHS_2004_Metro_PUF_Flat.zip"
getURL(url, nobody=1L, header=1L)
## [1] "Content-Length: 21288307\r\nAccept-ranges: bytes\r\n"

how to get file size without downloading it in python requests

Instead of using GET request, do HEAD request:

resp = requests.request('HEAD', "https://Whereever.user.wants.com/THEFILE.zip")

The HTTP HEAD method requests the headers that would be returned if the HEAD request's URL was instead requested with the HTTP GET method. In your case, where URL produces a large download, a HEAD request would read its Content-Length header to get the filesize without actually downloading the file.

Get file size of remote url without downloading in Google app engine(php)

Just use http streams API

function csize($url) {
$options = ['http' => [
'method' => 'HEAD',
],
];
$ctx = stream_context_create($options);
$result = file_get_contents($url, false, $ctx);
if ($result !== false) {
foreach($http_response_header as $header) {
if (preg_match("/Content-Length: (\d+)/i", $header, $matches)) {
return $matches[1];
}
}
}
}

Get size of a remote file

If the site redirects via. the Location header you can use:

// get the redirect url
$headers = get_headers("http://somedomain.com/files/34", 1);
$redirectUrl = $headers['Location'];

// get the filesize
$headers = get_headers($redirectUrl, 1);
$filesize = $headers["Content-Length"];

Please note that this code should not be used in production as there are no checks for existing array keys or error handling.

Get remote file size from HTTPS url

You need to add the below cURL parameter to your existing set.

curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, 0);
  • Adding this will stop cURL from verifying the peer's certificate.


Related Topics



Leave a reply



Submit