How Does Ancillary Data in Sendmsg() Work

How sendmsg works?

The manpage speaks of a message (singular) and multiple elements (plural):

For send() and sendto(), the message is found in buf and has length
len. For sendmsg(), the message is pointed to by the elements of the
array msg.msg_iov. The sendmsg() call also allows sending ancillary
data (also known as control information).

For a stream socket, it wouldn't matter either way. Any data you send will just end up as one long stream of data on the other side.

For datagram or message sockets, I can see why a bit more clarity would be helpful. But it appears that you send just one datagram or message with a single sndmsg call; not one per buffer element.

I actually went digging in the Linux source code out of curiosity and to get a better feeling about this answer. It looks like send, and sendto are just wrappers for sendmsg in Linux, that build the struct msghdr for you. And in fact, the UDP sendmsg implementation makes room for one UDP header per sendmsg call.

If performance is what you're worried about, it doesn't look like you'll benefit from sendmsg if you pass in just a single iovec. If you're concatenating buffers in user-space, though, this could potentially win you some.

It's a bit similar to writev, with the added benefit that you can specify a destination address for use with connectionless sockets like UDP. You can also add ancillary data, if you're into that sort of thing. (Commonly used to send file descriptors across UNIX domain sockets.)

socket.recvmsg is ignoring ancbufsize, ancillary data

How does ancillary data in sendmsg() work?

I think that ancillary data is never sent over a TCP socket, frustrating that is not listed in many documents, including the python socket docs, c message docs, or even the APUE book

iov and msg_control in sendmsg and recvmsg

Ancillary data or control messages (.msg_controllen bytes at .msg_control) is data provided or verified by the kernel, whereas the normal payload (in iovecs) is just data received from the other endpoint, unverified and unchecked by the kernel (except for checksum, if the protocol has one).

For IP sockets (see man 7 ip), there are several socket options that cause the kernel to provide ancillary data on received messages. For example:

  • IP_RECVORIGDSTADDR socket option tells the kernel to provide a IP_ORIGDSTADDR type ancillary message (with a struct sockaddr_in as data), identifying the original destination address of the datagram received

  • IP_RECVOPTS socket option tells the kernel to provide a IP_OPTIONS type ancillary message containing all IP option headers (up to 40 bytes for IPv4) for incoming datagrams

Ping and traceroute uses ICMP messages over IP; see man 7 icmp (and man 7 raw) for details.

Because most ICMP responses do not contain useful data filled in by the sender, the iovecs don't usually contain anything interesting. Instead, the interesting data is in the IP message headers and options.

For example, an ICMP Echo reply packets contain just 8 bytes (64 bits): 8-bit type (0), 8-bit code (0), 16-bit checksum, 16-bit id, and 16-bit sequence number. To get the IP headers with the interesting fields, you need the kernel to provide them as ancillary data control messages.


The background:

As described in the sendmsg() and related man pages, we have

ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);

struct msghdr {
void *msg_name; /* Optional address */
socklen_t msg_namelen; /* Size of address */
struct iovec *msg_iov; /* Scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* Ancillary data */
size_t msg_controllen; /* Ancillary data buffer len */
int msg_flags; /* Flags (unused) */
};

struct iovec {
void *iov_base; /* Starting address */
size_t iov_len; /* Number of bytes to transfer */
};

with man 3 cmsg describing how to construct and access such ancillary data,

struct cmsghdr {
size_t cmsg_len; /* Data byte count, including header
(type is socklen_t in POSIX) */
int cmsg_level; /* Originating protocol */
int cmsg_type; /* Protocol-specific type */
unsigned char cmsg_data[]; /* Data itself */
};

struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *msgh);
struct cmsghdr *CMSG_NXTHDR(struct msghdr *msgh, struct cmsghdr *cmsg);
size_t CMSG_ALIGN(size_t length);
size_t CMSG_SPACE(size_t length);
size_t CMSG_LEN(size_t length);
unsigned char *CMSG_DATA(struct cmsghdr *cmsg);

These ancillary data messages are always sufficiently aligned for the current architecture (so that the data items can be directly accessed), so to construct a proper ancillary message (SCM_CREDENTIALS to pass user, group, and process ID information over an Unix domain socket, or SCM_RIGHTS to pass file descriptors), these macros have to be used. The man 3 cmsg man page contains example code for these.

Suffice it to say, that to loop over each ancillary data part in a given message (struct msghdr msg), you use something that boils down to

char *const  end = (char *)msg.msg_control + msg.msg_controllen;
char *ptr = (char *)msg.msg_control;

for (char *ptr = (char *)msg.msg_control; ptr < end;
ptr += ((struct cmsghdr *)ptr)->cmsg_len) {
struct cmsghdr *const cmsg = (struct cmsghdr *)ptr;

/* level is cmsg->cmsg_level and type is cmsg->cmsg_type, and
cmsg->cmsg_data is sufficiently aligned for the level and type,
so you can use ((datatype *)(cmsg->cmsg_data)) to obtain a pointer
to the type corresponding to this level and type ancillary payload.
The exact size of the payload is
(cmsg->cmsg_len - sizeof (struct cmsghdr))
so e.g. an SCM_RIGHTS ancillary message, with
cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS
has exactly
(cmsg->cmsg_len - sizeof (struct cmsghrd)) / sizeof (int)
new file descriptors as a payload.
*/
}

Partial read/write issue in sendmsg/recvmsg

From a quick gander at the kernel source (linux, but see below), I believe it's up to you to make sure that the ancillary data only gets sent once. That is, in non-blocking mode, if there is no room in the receiving socket, you will get back EAGAIN/EWOULDBLOCK and neither data nor ancillary data will be sent. But if there is some space on the receiving side, then the initial portion of the data will be sent and the ancillary data will also be sent. You would then receive a return byte count indicating a partial send but the ancillary data will have been sent.

You need to be aware of that when you attempt the send the rest of your message, because the kernel maintains no memory that you've previously sent a partial buffer with which the subsequent buffer is logically contiguous (really no way it could -- you could be sending entirely different data for all it knows). So if you simply provide the same ancillary data for subsequent buffer parts, I believe the kernel will happily deliver the ancillary data again with your subsequent buffer part(s). This might result in duplicate file descriptors on the receiver side (which you would probably neglect to close since you wouldn't be expecting them) -- if you don't avoid it.

Now if you're in blocking mode on the sending side, and the transfer gets broken into multiple parts, the ancillary data will only be sent once -- with the first buffer part, because the sending of the entire buffer remains within kernel control.

On the receiving side, you would therefore need to be aware that the ancillary data accompanies the first chunk of received data, if you haven't received the entire logical message.

I believe this behavior is consistent with that reported in the stackexchange reference given by @Klas-Lindbäck (https://unix.stackexchange.com/questions/185011/what-happens-with-unix-stream-ancillary-data-on-partial-reads). (That question didn't deal with non-blocking-mode though.)

This answer is specific to linux. So it's certainly possible that results would differ slightly on other OSes, though it's difficult for me to see how they could be significantly different and still maintain sane semantics. The kernel can't reasonably maintain a memory of what has been sent previously and the sendmsg prototype doesn't allow it to overwrite the user's msghdr to reflect that the msg_control part has already been sent.



Related Topics



Leave a reply



Submit