Receiving Chunked Http Data with Winsock

Receiving Chunked HTTP Data With Winsock

You need to change your reading code. You cannot read chunked data using a fixed-length buffer like you are trying to do. The data is sent in variable-length chunks, where each chunk has a header that specifies the actual length of the chunk in bytes, and the final chunk of the data has a length of 0. You need to read the chunked headers in order to process the chunks properly. Please read RFC 2616 Section 3.6.1. Your logic needs to be more like the following pseudo-code:

send request;

status = recv() a line of text until CRLF;
parse status as needed;
response-code = extract response-code from status;
response-version = extract response-version from status;

do
{
line = recv() a line of text until CRLF;
if (line is blank)
break;
store line in headers list;
}
while (true);

parse headers list as needed;

if ((response-code is not in [1xx, 204, 304]) and (request was not "HEAD"))
{
if (Transfer-Encoding header is present and not "identity")
{
do
{
line = recv a line of text until CRLF;
length = extract length from line;
extensions = extract extensions from line;
process extensions as needed; // optional
if (length == 0)
break;
recv() length number of bytes into destination buffer;
recv() and discard bytes until CRLF;
}
while (true);

do
{
line = recv a line of text until CRLF;
if (line is blank)
break;
store line in headers list as needed;
}
while (true);

re-parse headers list as needed;
}
else if (Content-Length header is present)
{
recv() Content-Length number of bytes into destination buffer;
}
else if (Content-Type header starts with "multipart/")
{
boundary = extract boundary from Content-Type's "boundary" attribute;
recv() data into destination buffer until MIME termination boundary is reached;
}
else
{
recv() data into destination buffer until disconnected;
}
}

if (not disconnected)
{
if (response-version is "HTTP/1.1")
{
if (Connection header is "close")
close connection;
}
else
{
if (Connection header is not "keep-alive")
close connection;
}
}

check response-code for errors;
process destination buffer, per info in headers list;

WinSock Chunked Data Encoding

See this wiki page: http://en.wikipedia.org/wiki/Chunked_transfer_encoding

Each of these hex numbers (chunk length) is followed by the actual chunk data (payload) of the specified size, immediately followed by another chunk length. If chunk length is zero, no further data bytes follow (eof). These elements are separated by line breaks.
I'm not sure, whether the content you posted can be catenated correctly, it seems, you'd need to handle multiple, contiguous line breaks. Just look at the page and its source in a browser.

EDIT:

Just found this sniffing tool, it displays all the details I'd like to know in your situation:

http://web-sniffer.net/

Receiving http requests with winsock

It is, Because HTTP is a protocol that (usually) uses TCP as the underlying transportation protocol.

But trying to build a real HTTP layer on top of a simple win32 socket is a bit too much even for an experienced C++ developer.

Many un-experienced C++ developers would probably dismiss this task as a "well, you just need to read some data, parse the headers, assemble your own HTTP response and send it back".

but..
You will have to support

  • TLS, with all the nasty private keys/public keys implementation
  • Redirection
  • Chunked Transfer
  • G-Zip transfer

and the list goes on and on..

So practically speaking, if you just want to to accept a socket, read some data and send some basic HTTP response than yes. If you want a reliable, professional HTTP library - probably no.

weird numbers in http response body during python socket programming

I am missing something but what?

You are not taking into account that these responses are using the chunked transfer encoding format (via the Transfer-Encoding: chunked header) to send the body data in chunks instead of a single byte stream, as you are expecting. See RFC 2616 Section 3.6.1 and RFC 7230 Section 4.1 for more details on the chunked format.

What are they?

The numbers you are referring to are chunk size indicators.

  • In the 1st response shown, there is a single chunk of data whose byte size is 4a0 (0x4A0 hex, 1184 decimal), followed by a terminating chunk whose byte size is 0.

  • In the 2nd response shown, there is a single chunk of data whose byte size is bcd (0xBCD hex, 3021 decimal), followed by a terminating chunk whose byte size is 0.

The 0-length chunk ends the body data (there is no Content-Length or Connection: close header present to end the responses otherwise).

You won't be able to use a simple recv() loop to read chunked bodies. You have to detect the chunked header, and if present then read and parse each chunk individually. Read a chunk size, skip up to the following CRLF, read the specified number of bytes, skip up to the following CRLF. Repeat until a 0-length chunk is reached. Then read a set of trailing HTTP headers that may follow the chunks, overwriting any headers that you read before the body.

See the pseudo code I present in this answer and this answer.

How to Strip the header from data received using recv in c++?

OK, I came up with function that strips header before saving.

 int nDataLength;
int i = 0;
static char buffer[4096];
while ((nDataLength = recv(Socket, buffer, sizeof(buffer), NULL)) > 0)
{
char* content = strstr(buffer, "\r\n\r\n");
if (content != NULL) {
std::string s2(buffer);
size_t p = s2.find("\r\n\r\n");
fwrite(buffer+p+4, nDataLength-p-4, 1, pFile);
}
else
{
fwrite(buffer, nDataLength, 1, pFile);
}

}

Why recv() in winsock client cannot get any data once httpRetransmition happens?

Here is a second answer with more dynamic buffer handling and more error checking:

void send_data(SOCKET sock, void *data, unsigned int data_len)
{
unsigned char *ptr = (unsigned char*) data;

while (data_len > 0)
{
int num_to_send = (int) std::min(1024*1024, data_len);

int num_sent = send(sock, ptr, num_to_send, 0);
if (num_sent < 0)
{
if ((num_sent == SOCKET_ERROR) && (WSAGetLastError() == WSAEWOULDBLOCK))
continue;

die_with_error("send() failed");
}

if (num_sent == 0)
die_with_error("socket disconnected");

ptr += num_sent;
data_len -= num_sent;
}
}

unsigned int recv_data(SOCKET sock, void *data, unsigned int data_len, bool error_on_disconnect = true)
{
unsigned char *ptr = (unsigned char*) data;
unsigned int total = 0;

while (data_len > 0)
{
int num_to_recv = (int) std::min(1024*1024, data_len);

int num_recvd = recv(sock, ptr, num_to_recv, 0);
if (num_recvd < 0)
{
if ((num_recvd == SOCKET_ERROR) && (WSAGetLastError() == WSAEWOULDBLOCK))
continue;

die_with_error("recv() failed");
}

if (num_recvd == 0)
{
if (error_on_disconnect)
die_with_error("socket disconnected");

break;
}

ptr += num_recvd;
datalen -= num_recvd;
total += num_recvd;
}
while (true);

return total;
}

std::string recv_line(SOCKET sock)
{
std::string line;
char c;

do
{
recv_data(sock, &c, 1);

if (c == '\r')
{
recv_data(sock, &c, 1);

if (c == '\n')
break;

line += '\r';
}

else if (c == '\n')
break;

line += c;
}

return line;
}

void recv_headers(SOCKET sock, std::vector<std::string> *hdrs)
{
do
{
std::string line = recv_line(sock);
if (line.length() == 0)
return;

if (hdrs)
hdrs->push_back(line);
}
while (true);
}

unsigned int recv_chunk_size(SOCKET sock)
{
std::string line = recv_line(sock);
size_t pos = line.find(";");
if (pos != std::string::npos)
line.erase(pos);
char *endptr;
unsigned int value = strtoul(line.c_str(), &endptr, 16);
if (*endptr != '\0')
die_with_error("bad Chunk Size received");
return value;
}

std::string find_header(const std::vector<std::string> &hrds, const std::string &hdr_name)
{
std::string value;

for(size_t i = 0; i < hdrs.size(); ++i)
{
const std::string hdr = hdrs[i];

size_t pos = hdr.find(":");
if (pos != std::string::npos)
{
if (hdr.compare(0, pos-1, name) == 0)
{
pos = hdr.find_first_not_of(" ", pos+1);
if (pos != std::string::npos)
return hdr.substr(pos);

break;
}
}
}

return "";
}

{
// send request ...

std::string request = ...;

send_data(sock, request.c_str(), request.length());

// Record the time of httpRequestSent
::QueryPerformanceCounter(&httpRequestSent);
::QueryPerformanceFrequency(&frequency);

// get response ...

std::vector<std::string> resp_headers;
std::vector<unsigned char> resp_data;

recv_headers(sock, &resp_headers);

std::string transfer_encoding = find_header(resp_headers, "Transfer-Encoding");
if (transfer_encoding.find("chunked") != std::string::npos)
{
unsigned int chunk_len = recv_chunk_size(sock);
while (chunk_len != 0)
{
size_t offset = resp_data.size();
resp_data.resize(offset + chunk_len);
recv_data(sock, &resp_data[offset], chunk_len);

recv_line(sock);
chunk_len = recv_chunk_size(sock);
}

recv_headers(sock, NULL);
}
else
{
std::string content_length = find_header(resp_headers, "Content-Length");
if (content_length.length() != 0)
{
char *endptr;
unsigned int content_length_value = strtoul(content_length.c_str(), &endptr, 10);

if (*endptr != '\0')
die_with_error("bad Content-Length value received");

if (content_length_value > 0)
{
resp_data.resize(content_length_value);
recv_data(sock, &resp_data[0], content_length_value);
}
}
else
{
unsigned char buffer[BUFFERSIZE];
do
{
unsigned int buffer_len = recv_data(sock, buffer, BUFFERSIZE, false);
if (buffer_len == 0)
break;

size_t offset = resp_data.size();
resp_data.resize(offset + buffer_len);
memcpy(&resp_data[offset], buffer, buffer_len);
}
while (true)
}
}

::QueryPerformanceCounter(&httpResponseGot);

// process resp_data as needed
// may be compressed, encoded, etc...

// Display the HTTP duration
httpDuration = (double)(httpResponseGot.QuadPart - httpRequestSent.QuadPart) / (double)frequency.QuadPart;
printf("The HTTP duration is %lf\n", httpDuration);
}


Related Topics



Leave a reply



Submit