How to Read Only First N Bytes from the Http Server Using Linux Command

Is it possible to read only first N bytes from the HTTP server using Linux command?


curl <url> | head -c 499

or

curl <url> | dd bs=1 count=499

should do

Also there are simpler utils with perhaps borader availability like

    netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff

HERE

Or

GET /urlpath/query?string=more&bloddy=stuff

read standard input from web server

It is possible that read will not return 5000 bytes due to network delays and other factors. You want to read until you read EOF, or the number of bytes read reaches 5000, or read returns an error code, by calling read in a loop for example. From the read(2) manpage:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.

socket read()'s byte offset (linux)

You've got it wrong. The third parameter of a read() call isn't an offset, it represents the number of bytes you want to read, at max.
The actual number read is returned by the read() call.

Look here : http://man7.org/linux/man-pages/man2/read.2.html


Also, you should pass a pointer to read() calls.

So message[recv_len] isn't gonna do. You're looking for message + recv_len

What is the best way to open a URL and get up to X bytes in Python?

This is probably what you're looking for:

import urllib

def download(url, bytes = 1024):
"""Copy the contents of a file from a given URL
to a local file.
"""
webFile = urllib.urlopen(url)
localFile = open(url.split('/')[-1], 'w')
localFile.write(webFile.read(bytes))
webFile.close()
localFile.close()

Using sed to read byte count of a website from wget


sed usually requires an input file right? Piping the output from the wget command doesn't make it a file. How come it works without this?

Like most Unix utilities, sed will process files if they're given as arguments, otherwise it will process its standard input.

I don't understand what -e means. I've looked up the linux man pages and it mentions it is for "script" ? What does that means? Also, what is happening in the line with the quotes?

-e is used to indicate that the next argument is a string of sed operations to execute (the documentation calls this a "script"). This is the default for the first argument to sed, but the script you got happens to used it explicitly. It's mostly useful when you're giving multiple commands, because if you didn't use -e before the additional commands they would be treated as filenames. See also

what does dash e(-e) mean in sed commands?

In your command, the -n option means that sed should not print its input lines by default -- you'll use the p operation to print selected lines explicitly. /Content-Length/ matches lines that contain that string, and this is followed by a set of operations to perform on those matching lines in {}. The first operation is s/.*: //, which replaces everything up to the : and the space after it with nothing. The second operation is p, which prints the modified line. So that prints the number after Content-Length:.

Validating or reading a remote image type in javascript

JavaScript is a client-side language in the main, so you can't simply programmatically read files hosted on other sites from your own page. XMLHttpRequest will let you retrieve remote data, but that's going to retrieve the whole file, not the first few bytes. Doing anything more fancy is going to require a server-side helper, e.g. a PHP script or similar.

You should also be aware that some HTTP servers will not allow you to retrieve a range of bytes from the output anyway - they'll always return the whole file. So, even with a PHP script using, say, curl or wget, you may not be able to get this to work for all remote sites:

Is it possible to read only first N bytes from the HTTP server using Linux command?

(The above covers command-line curl, but the conclusions are broadly applicable.)

Edit: Sergiu points out that adding the Range header to an XMLHttpRequest will work, at least for some servers (the Range header is optional for HTTP 1.1). However, cross-domain image retrieval will still need further support such as CORS.

how to tell the http server stop sending redundant bytes that i previously requested through get ?

So imagine you want to download a 4M file and want two threads to do it. Wouldn't it be nice to have the first thread getting from 0 towards the middle, and the second thread bytes from the end towards the middle? Thats the first thing that came to mind when looking at this problem.

Since you can request ranges you can either terminate the connection when you got enough or you can work in blocks. A 4MB file would by 1024 blocks of 4k. The two threads gets blocks from either side and you stop when you got those 1024 blocks without caring which thread got more.

But this is actually only visually pleasing. If you have a counter c of n blocks you can use it to have a number of threads, based on the size of the download and a maximum limit of threads. All can claim the first free block by incrementing the counter and storing that as it's job. If something happens you need to make sure the threads job gets done. Other than that you'll be done when the counter reaches n and all threads are done.

I was thinking bitset, but I think you have enough control by knowing current job for each thread and that counter. Make sure the method incrementing it is synchronized though.

How to get remote file size from a shell script?

You can download the file and get its size. But we can do better.

Use curl to get only the response header using the -I option.

In the response header look for Content-Length: which will be followed by the size of the file in bytes.

$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134

To get the size use a filter to extract the numeric part from the output above:

$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134

php method to read only part of remote file

If anyone stumbles upon this a good way to handle it is to use cURL and CURLOPT_WRITEFUNCTION.



Related Topics



Leave a reply



Submit