How to Get the File Size from Http Headers

How to get the file size from http headers

Yes, assuming the HTTP server you're talking to supports/allows this:

public long GetFileSize(string url)
{
long result = -1;

System.Net.WebRequest req = System.Net.WebRequest.Create(url);
req.Method = "HEAD";
using (System.Net.WebResponse resp = req.GetResponse())
{
if (long.TryParse(resp.Headers.Get("Content-Length"), out long ContentLength))
{
result = ContentLength;
}
}

return result;
}

If using the HEAD method is not allowed, or the Content-Length header is not present in the server reply, the only way to determine the size of the content on the server is to download it. Since this is not particularly reliable, most servers will include this information.

get file size before downloading using HTTP header not matching with one retrieved from urlopen

By default, requests will send 'Accept-Encoding': 'gzip' as part of the request headers, and the server will respond with the compressed content:

>>> r = requests.head('http://pymotw.com/2/urllib/index.html')
r>>> r.headers['content-encoding'], r.headers['content-length']
('gzip', '8201')

But, if you manually set the request headers, then you'll get the uncompressed content:

>>> r = requests.head('http://pymotw.com/2/urllib/index.html',headers={'Accept-Encoding': 'identity'})
>>> r.headers['content-length']
'38227'

How to get the real file size of a file in a multipart/form-data request


  1. You cannot expect that the Content-Length header is set for the multipart message (some multipart request types even forbid a content length header). It's only required that either a Content-Length header or a Transfer-Encoding header is present.

  2. There must be no Content-Length header in a part of a multipart message.

However, your sender may set an additional parameter in the disposition header. You could leverage this to set the file size as follows:

Content-Disposition: form-data; name="file"; filename="3byte.txt"; size=123

But this would be purely optional, the parameters's name would be arbitrary, and no sender is required to include this parameter. That means, a HTTP server engine can never rely on it.

Get file size using python-requests, while only getting the header

Send a HEAD request:

>>> import requests
>>> response = requests.head('http://example.com')
>>> response.headers
{'connection': 'close',
'content-encoding': 'gzip',
'content-length': '606',
'content-type': 'text/html; charset=UTF-8',
'date': 'Fri, 11 Jan 2013 02:32:34 GMT',
'last-modified': 'Fri, 04 Jan 2013 01:17:22 GMT',
'server': 'Apache/2.2.3 (CentOS)',
'vary': 'Accept-Encoding'}

A HEAD request is like a GET request that only downloads the headers. Note that it's up to the server to actually honor your HEAD request. Some servers will only respond to GET requests, so you'll have to send a GET request and just close the connection instead of downloading the body. Other times, the server just never specifies the total size of the file.

What's the Content-Length field in HTTP header?

rfc2616

The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs, sent to the recipient or, in
the case of the HEAD method, the size of the entity-body that would
have been sent had the request been a GET.

It doesn't matter what the content-type is.

Extension at post below.

How to get file size of http file path

con.getContentLength() may give you what you want, but only if the server provided it as a response header. If the server used "chunked" encoding instead of providing a Content-Length header then the total length is not available up-front.

How can I get the file size on the Internet knowing only the URL


import urllib2
f = urllib2.urlopen("http://your-url")
size= f.headers["Content-Length"]
print size

Python | HTTP - How to check file size before downloading it

If the server supplies a Content-Length header, then you can use that to determine if you'd like to continue downloading the remainder of the body or not. If the server does not provide the header, then you'll need to stream the response until you decide you no longer want to continue.

To do this, you'll need to make sure that you're not preloading the full response.

from urllib3 import PoolManager

pool = PoolManager()
response = pool.request("GET", url, preload_content=False)

# Maximum amount we want to read
max_bytes = 1000000

content_bytes = response.headers.get("Content-Length")
if content_bytes and int(content_bytes) < max_bytes:
# Expected body is smaller than our maximum, read the whole thing
data = response.read()
# Do something with data
...
elif content_bytes is None:
# Alternatively, stream until we hit our limit
amount_read = 0
for chunk in r.stream():
amount_read += len(chunk)
# Save chunk
...
if amount_read > max_bytes:
break

# Release the connection back into the pool
response.release_conn()

How can I get the file size from a link without downloading it in python?

To do this use the HTTP HEAD method which just grabs the header information for the URL and doesn't download the content like an HTTP GET request does.

$curl -I https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 18 Mar 2019 16:56:35 GMT
Content-Type: application/octet-stream
Content-Length: 578220087
Last-Modified: Tue, 21 Feb 2017 12:13:19 GMT
Connection: keep-alive
Accept-Ranges: bytes

The file size is in the 'Content-Length' header. In Python 3.6:

>>> import urllib
>>> req = urllib.request.Request('https://sra-download.ncbi.nlm.nih.gov/traces/sra46/SRR/005150/SRR5273887',
method='HEAD')
>>> f = urllib.request.urlopen(req)
>>> f.status
200
>>> f.headers['Content-Length']
'578220087'


Related Topics



Leave a reply



Submit