Is it possible to read only first N bytes from the HTTP server using Linux command?
curl <url> | head -c 499
or
curl <url> | dd bs=1 count=499
should do
Also there are simpler utils with perhaps borader availability like
netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff
HERE
Or
GET /urlpath/query?string=more&bloddy=stuff
What's the Content-Length field in HTTP header?
rfc2616
The Content-Length entity-header field indicates the size of the
entity-body, in decimal number of OCTETs, sent to the recipient or, in
the case of the HEAD method, the size of the entity-body that would
have been sent had the request been a GET.
It doesn't matter what the content-type is.
Extension at post below.
Access HTTP response as string in Go
bs := string(body)
should be enough to give you a string.
From there, you can use it as a regular string.
A bit as in this thread
(updated after Go 1.16 -- Q1 2021 -- ioutil
deprecation: ioutil.ReadAll()
=> io.ReadAll()
):
var client http.Client
resp, err := client.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode == http.StatusOK {
bodyBytes, err := io.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
bodyString := string(bodyBytes)
log.Info(bodyString)
}
See also GoByExample.
As commented below (and in zzn's answer), this is a conversion (see spec).
See "How expensive is []byte(string)
?" (reverse problem, but the same conclusion apply) where zzzz mentioned:
Some conversions are the same as a cast, like
uint(myIntvar)
, which just reinterprets the bits in place.
Sonia adds:
Making a string out of a byte slice, definitely involves allocating the string on the heap. The immutability property forces this.
Sometimes you can optimize by doing as much work as possible with []byte and then creating a string at the end. Thebytes.Buffer
type is often useful.
How do I read the header without reading the rest of the HTTP resource?
Depending on what you actually want to accomplish:
Don't care about the body at all:
Use HEAD
instead of GET
:
uri = URI('http://example.com')
http = Net::HTTP.start uri.host, uri.port
request = Net::HTTP::Head.new uri
response = http.request request
response.body
# => nil
Load the body conditionally
Using blocks with net/http
will let you hook before the body is actually loaded:
uri = URI('http://example.com')
res = nil
Net::HTTP.start(uri.host, uri.port) do |http|
request = Net::HTTP::Get.new uri
http.request request do |response|
res = response
break
end
end
res
# => #<Net::HTTPOK 200 OK readbody=false>
res['Content-Type']
# => "text/html; charset=UTF-8"
How can I properly read the sequence of bytes from a hyper::client::Request and print it to the console as a UTF-8 string?
We can confirm with the iconv
command that the data returned from http://www.google.com
is not valid UTF-8:
$ wget http://google.com -O page.html
$ iconv -f utf-8 page.html > /dev/null
iconv: illegal input sequence at position 5591
For some other urls (like http://www.reddit.com
) the code works fine.
If we assume that the most part of the data is valid UTF-8, we can use String::from_utf8_lossy
to workaround the problem:
pub fn print_html(url: &str) {
let client = Client::new();
let req = client.get(url).send();
match req {
Ok(mut res) => {
println!("{}", res.status);
let mut body = Vec::new();
match res.read_to_end(&mut body) {
Ok(_) => println!("{:?}", String::from_utf8_lossy(&*body)),
Err(why) => panic!("String conversion failure: {:?}", why),
}
}
Err(why) => panic!("{:?}", why),
}
}
Note that that Read::read_to_string
and Read::read_to_end
return Ok
with the number of read bytes on success, not the read data.
Maximum length of HTTP GET request
The limit is dependent on both the server and the client used (and if applicable, also the proxy the server or the client is using).
Most web servers have a limit of 8192 bytes (8 KB), which is usually configurable somewhere in the server configuration. As to the client side matter, the HTTP 1.1 specification even warns about this. Here's an extract of chapter 3.2.1:
Note: Servers ought to be cautious about depending on URI lengths above 255 bytes, because some older client or proxy implementations might not properly support these lengths.
The limit in Internet Explorer and Safari is about 2 KB, in Opera about 4 KB and in Firefox about 8 KB. We may thus assume that 8 KB is the maximum possible length and that 2 KB is a more affordable length to rely on at the server side and that 255 bytes is the safest length to assume that the entire URL will come in.
If the limit is exceeded in either the browser or the server, most will just truncate the characters outside the limit without any warning. Some servers however may send an HTTP 414 error.
If you need to send large data, then better use POST instead of GET. Its limit is much higher, but more dependent on the server used than the client. Usually up to around 2 GB is allowed by the average web server.
This is also configurable somewhere in the server settings. The average server will display a server-specific error/exception when the POST limit is exceeded, usually as an HTTP 500 error.
read() are not reading the complete http response
Don't use str*
functions on arbitrary data. These are made to operate on C strings, which are zero-terminated. Binary data (most image formats) can contain zeros in the middle.
You should be using memcpy
/memmove
, and you have to rely on the return value of read
to know how much data you actually got. strlen
on binary data is meaningless.
Try replacing this part:
bytesloaded += readed;
strcat(buffer, tbuf);
With something like:
if (bytesloaded+readed >= buf_size) {
// do the realloc now
}
memcpy(buffer+bytesloaded, tbuf, readed);
bytesloded += readed;
buffer + x
(with x
an integer type whose value is less than the allocated buffer size) is a pointer to the x
th char in buffer
. (This is pointer arithmetic. The type of buffer
matters. In this case, it is invalid if x
is negative.)
You need to perform the re-allocation before you attempt the memcpy
otherwise you risk writing past the end of the buffer.memcpy
is safe here because you know that buffer
and tbuf
don't overlap.
Can HTTP POST be limitless?
EDIT (2019) This answer is now pretty redundant but there is another answer with more relevant information.
It rather depends on the web server and web browser:
Internet explorer All versions 2GB-1
Mozilla Firefox All versions 2GB-1
IIS 1-5 2GB-1
IIS 6 4GB-1
Although IIS only support 200KB by default, the metabase needs amending to increase this.
http://www.motobit.com/help/scptutl/pa98.htm
The POST method itself does not have any limit on the size of data.
HTTP GET with request body
Roy Fielding's comment about including a body with a GET request.
Yes. In other words, any HTTP request message is allowed to contain a message body, and thus must parse messages with that in mind. Server semantics for GET, however, are restricted such that a body, if any, has no semantic meaning to the request. The requirements on parsing are separate from the requirements on method semantics.
So, yes, you can send a body with GET, and no, it is never useful to do so.
This is part of the layered design of HTTP/1.1 that will become clear again once the spec is partitioned (work in progress).
....Roy
Yes, you can send a request body with GET but it should not have any meaning. If you give it meaning by parsing it on the server and changing your response based on its contents, then you are ignoring this recommendation in the HTTP/1.1 spec, section 4.3:
...if the request method does not include defined semantics for an entity-body, then the message-body SHOULD be ignored when handling the request.
And the description of the GET method in the HTTP/1.1 spec, section 9.3:
The GET method means retrieve whatever information ([...]) is identified by the Request-URI.
which states that the request-body is not part of the identification of the resource in a GET request, only the request URI.
Update
The RFC2616 referenced as "HTTP/1.1 spec" is now obsolete. In 2014 it was replaced by RFCs 7230-7237. Quote "the message-body SHOULD be ignored when handling the request" has been deleted. It's now just "Request message framing is independent of method semantics, even if the method doesn't define any use for a message body" The 2nd quote "The GET method means retrieve whatever information ... is identified by the Request-URI" was deleted. - From a comment
From the HTTP 1.1 2014 Spec:
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.
Related Topics
Can One Yaml Object Refer to Another
How to Get the Current Test Filename from Rspec
Encryption-Decryption in Rails
Homebrew Install Ruby Keg-Only Can't Find Gem
Capistrano 3 Change Ssh_Options Inside Task
Rails 3.1 - Changing Default Scaffold Views and Template
Best Way to Handle Dynamic CSS in a Rails App
Accepts_Nested_Attributes_For Ignore Blank Values
How to Set a Hook to Run Code at the End of a Ruby Class Definition
Change Default Date Format in Ruby on Rails
Why Don't More Projects Use Ruby Symbols Instead of Strings
/Config/Initializers/Secret_Token.Rb Not Being Generated. Why Not
Install Cocoapods Failed on MAC
How to Remove Non-Printable/Invisible Characters in Ruby
Using SASS with User-Specified Colors
Problem Using Openstruct with Erb
Error Loading Media: File Could Not Be Played Error in Jw_Player Rails