How to Read Only X Number of Bytes of the Body Using Net::Http

Is it possible to read only first N bytes from the HTTP server using Linux command?

curl <url> | head -c 499

or

curl <url> | dd bs=1 count=499

should do

Also there are simpler utils with perhaps borader availability like

    netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff

HERE

Or

GET /urlpath/query?string=more&bloddy=stuff

What's the Content-Length field in HTTP header?

rfc2616

The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs, sent to the recipient or, in
the case of the HEAD method, the size of the entity-body that would
have been sent had the request been a GET.

It doesn't matter what the content-type is.

Extension at post below.

Access HTTP response as string in Go

bs := string(body) should be enough to give you a string.

From there, you can use it as a regular string.

A bit as in this thread

(updated after Go 1.16 -- Q1 2021 -- ioutil deprecation: ioutil.ReadAll() => io.ReadAll()):

var client http.Client
resp, err := client.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()

if resp.StatusCode == http.StatusOK {
bodyBytes, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
bodyString := string(bodyBytes)
log.Info(bodyString)
}

See also GoByExample.

As commented below (and in zzn's answer), this is a conversion (see spec).

See "How expensive is []byte(string)?" (reverse problem, but the same conclusion apply) where zzzz mentioned:

Some conversions are the same as a cast, like uint(myIntvar), which just reinterprets the bits in place.

Sonia adds:

Making a string out of a byte slice, definitely involves allocating the string on the heap. The immutability property forces this.

Sometimes you can optimize by doing as much work as possible with []byte and then creating a string at the end. The bytes.Buffer type is often useful.

How do I read the header without reading the rest of the HTTP resource?

Depending on what you actually want to accomplish:

Don't care about the body at all:

Use HEAD instead of GET:

uri = URI('http://example.com')
http = Net::HTTP.start uri.host, uri.port
request = Net::HTTP::Head.new uri
response = http.request request
response.body
# => nil

Load the body conditionally

Using blocks with net/http will let you hook before the body is actually loaded:

uri = URI('http://example.com')
res = nil

Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri

http.request request do |response|
res = response
break
end
end

res
# => #<Net::HTTPOK 200 OK readbody=false>
res['Content-Type']
# => "text/html; charset=UTF-8"

How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?

We can confirm with the iconv command that the data returned from http://www.google.com is not valid UTF-8:

$ wget http://google.com -O page.html
$ iconv -f utf-8 page.html > /dev/null
iconv: illegal input sequence at position 5591

For some other urls (like http://www.reddit.com) the code works fine.

If we assume that the most part of the data is valid UTF-8, we can use String::from_utf8_lossy to workaround the problem:

pub fn print_html(url: &str) {
let client = Client::new();
let req = client.get(url).send();

match req {
Ok(mut res) => {
println!("{}", res.status);

let mut body = Vec::new();

match res.read_to_end(&mut body) {
Ok(_) => println!("{:?}", String::from_utf8_lossy(&*body)),
Err(why) => panic!("String conversion failure: {:?}", why),
}
}
Err(why) => panic!("{:?}", why),
}
}

Note that that Read::read_to_string and Read::read_to_end return Ok with the number of read bytes on success, not the read data.

Maximum length of HTTP GET request

The limit is dependent on both the server and the client used (and if applicable, also the proxy the server or the client is using).

Most web servers have a limit of 8192 bytes (8 KB), which is usually configurable somewhere in the server configuration. As to the client side matter, the HTTP 1.1 specification even warns about this. Here's an extract of chapter 3.2.1:

Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.

The limit in Internet Explorer and Safari is about 2 KB, in Opera about 4 KB and in Firefox about 8 KB. We may thus assume that 8 KB is the maximum possible length and that 2 KB is a more affordable length to rely on at the server side and that 255 bytes is the safest length to assume that the entire URL will come in.

If the limit is exceeded in either the browser or the server, most will just truncate the characters outside the limit without any warning. Some servers however may send an HTTP 414 error.

If you need to send large data, then better use POST instead of GET. Its limit is much higher, but more dependent on the server used than the client. Usually up to around 2 GB is allowed by the average web server.

This is also configurable somewhere in the server settings. The average server will display a server-specific error/exception when the POST limit is exceeded, usually as an HTTP 500 error.

read() are not reading the complete http response

Don't use str* functions on arbitrary data. These are made to operate on C strings, which are zero-terminated. Binary data (most image formats) can contain zeros in the middle.

You should be using memcpy/memmove, and you have to rely on the return value of read to know how much data you actually got. strlen on binary data is meaningless.

Try replacing this part:

bytesloaded += readed;
strcat(buffer, tbuf);

With something like:

if (bytesloaded+readed >= buf_size) {
// do the realloc now
}
memcpy(buffer+bytesloaded, tbuf, readed);
bytesloded += readed;

buffer + x (with x an integer type whose value is less than the allocated buffer size) is a pointer to the xth char in buffer. (This is pointer arithmetic. The type of buffer matters. In this case, it is invalid if x is negative.)

You need to perform the re-allocation before you attempt the memcpy otherwise you risk writing past the end of the buffer.

memcpy is safe here because you know that buffer and tbuf don't overlap.

Can HTTP POST be limitless?

EDIT (2019) This answer is now pretty redundant but there is another answer with more relevant information.

It rather depends on the web server and web browser:

Internet explorer All versions 2GB-1

Mozilla Firefox All versions 2GB-1

IIS 1-5 2GB-1

IIS 6 4GB-1

Although IIS only support 200KB by default, the metabase needs amending to increase this.

http://www.motobit.com/help/scptutl/pa98.htm

The POST method itself does not have any limit on the size of data.

HTTP GET with request body

Roy Fielding's comment about including a body with a GET request.

Yes. In other words, any HTTP request message is allowed to contain a message body, and thus must parse messages with that in mind. Server semantics for GET, however, are restricted such that a body, if any, has no semantic meaning to the request. The requirements on parsing are separate from the requirements on method semantics.

So, yes, you can send a body with GET, and no, it is never useful to do so.

This is part of the layered design of HTTP/1.1 that will become clear again once the spec is partitioned (work in progress).

....Roy

Yes, you can send a request body with GET but it should not have any meaning. If you give it meaning by parsing it on the server and changing your response based on its contents, then you are ignoring this recommendation in the HTTP/1.1 spec, section 4.3:

...if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.

And the description of the GET method in the HTTP/1.1 spec, section 9.3:

The GET method means retrieve whatever information ([...]) is identified by the Request-URI.

which states that the request-body is not part of the identification of the resource in a GET request, only the request URI.

Update

The RFC2616 referenced as "HTTP/1.1 spec" is now obsolete. In 2014 it was replaced by RFCs 7230-7237. Quote "the message-body SHOULD be ignored when handling the request" has been deleted. It's now just "Request message framing is independent of method semantics, even if the method doesn't define any use for a message body" The 2nd quote "The GET method means retrieve whatever information ... is identified by the Request-URI" was deleted. - From a comment

From the HTTP 1.1 2014 Spec:

A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.



Related Topics



Leave a reply



Submit