How sendmsg works?
The manpage speaks of a message (singular) and multiple elements (plural):
For
send()
andsendto()
, the message is found inbuf
and has length
len
. Forsendmsg()
, the message is pointed to by the elements of the
arraymsg.msg_iov
. Thesendmsg()
call also allows sending ancillary
data (also known as control information).
For a stream socket, it wouldn't matter either way. Any data you send will just end up as one long stream of data on the other side.
For datagram or message sockets, I can see why a bit more clarity would be helpful. But it appears that you send just one datagram or message with a single sndmsg
call; not one per buffer element.
I actually went digging in the Linux source code out of curiosity and to get a better feeling about this answer. It looks like send
, and sendto
are just wrappers for sendmsg
in Linux, that build the struct msghdr
for you. And in fact, the UDP sendmsg
implementation makes room for one UDP header per sendmsg
call.
If performance is what you're worried about, it doesn't look like you'll benefit from sendmsg
if you pass in just a single iovec
. If you're concatenating buffers in user-space, though, this could potentially win you some.
It's a bit similar to writev
, with the added benefit that you can specify a destination address for use with connectionless sockets like UDP. You can also add ancillary data, if you're into that sort of thing. (Commonly used to send file descriptors across UNIX domain sockets.)
socket.recvmsg is ignoring ancbufsize, ancillary data
How does ancillary data in sendmsg() work?
I think that ancillary data is never sent over a TCP socket, frustrating that is not listed in many documents, including the python socket docs, c message docs, or even the APUE book
iov and msg_control in sendmsg and recvmsg
Ancillary data or control messages (.msg_controllen
bytes at .msg_control
) is data provided or verified by the kernel, whereas the normal payload (in iovec
s) is just data received from the other endpoint, unverified and unchecked by the kernel (except for checksum, if the protocol has one).
For IP sockets (see man 7 ip), there are several socket options that cause the kernel to provide ancillary data on received messages. For example:
IP_RECVORIGDSTADDR
socket option tells the kernel to provide aIP_ORIGDSTADDR
type ancillary message (with astruct sockaddr_in
as data), identifying the original destination address of the datagram receivedIP_RECVOPTS
socket option tells the kernel to provide aIP_OPTIONS
type ancillary message containing all IP option headers (up to 40 bytes for IPv4) for incoming datagrams
Ping and traceroute uses ICMP messages over IP; see man 7 icmp (and man 7 raw) for details.
Because most ICMP responses do not contain useful data filled in by the sender, the iovec
s don't usually contain anything interesting. Instead, the interesting data is in the IP message headers and options.
For example, an ICMP Echo reply packets contain just 8 bytes (64 bits): 8-bit type (0), 8-bit code (0), 16-bit checksum, 16-bit id, and 16-bit sequence number. To get the IP headers with the interesting fields, you need the kernel to provide them as ancillary data control messages.
The background:
As described in the sendmsg() and related man pages, we have
ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags);
struct msghdr {
void *msg_name; /* Optional address */
socklen_t msg_namelen; /* Size of address */
struct iovec *msg_iov; /* Scatter/gather array */
size_t msg_iovlen; /* # elements in msg_iov */
void *msg_control; /* Ancillary data */
size_t msg_controllen; /* Ancillary data buffer len */
int msg_flags; /* Flags (unused) */
};
struct iovec {
void *iov_base; /* Starting address */
size_t iov_len; /* Number of bytes to transfer */
};
with man 3 cmsg describing how to construct and access such ancillary data,
struct cmsghdr {
size_t cmsg_len; /* Data byte count, including header
(type is socklen_t in POSIX) */
int cmsg_level; /* Originating protocol */
int cmsg_type; /* Protocol-specific type */
unsigned char cmsg_data[]; /* Data itself */
};
struct cmsghdr *CMSG_FIRSTHDR(struct msghdr *msgh);
struct cmsghdr *CMSG_NXTHDR(struct msghdr *msgh, struct cmsghdr *cmsg);
size_t CMSG_ALIGN(size_t length);
size_t CMSG_SPACE(size_t length);
size_t CMSG_LEN(size_t length);
unsigned char *CMSG_DATA(struct cmsghdr *cmsg);
These ancillary data messages are always sufficiently aligned for the current architecture (so that the data items can be directly accessed), so to construct a proper ancillary message (SCM_CREDENTIALS to pass user, group, and process ID information over an Unix domain socket, or SCM_RIGHTS to pass file descriptors), these macros have to be used. The man 3 cmsg man page contains example code for these.
Suffice it to say, that to loop over each ancillary data part in a given message (struct msghdr msg
), you use something that boils down to
char *const end = (char *)msg.msg_control + msg.msg_controllen;
char *ptr = (char *)msg.msg_control;
for (char *ptr = (char *)msg.msg_control; ptr < end;
ptr += ((struct cmsghdr *)ptr)->cmsg_len) {
struct cmsghdr *const cmsg = (struct cmsghdr *)ptr;
/* level is cmsg->cmsg_level and type is cmsg->cmsg_type, and
cmsg->cmsg_data is sufficiently aligned for the level and type,
so you can use ((datatype *)(cmsg->cmsg_data)) to obtain a pointer
to the type corresponding to this level and type ancillary payload.
The exact size of the payload is
(cmsg->cmsg_len - sizeof (struct cmsghdr))
so e.g. an SCM_RIGHTS ancillary message, with
cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS
has exactly
(cmsg->cmsg_len - sizeof (struct cmsghrd)) / sizeof (int)
new file descriptors as a payload.
*/
}
Partial read/write issue in sendmsg/recvmsg
From a quick gander at the kernel source (linux, but see below), I believe it's up to you to make sure that the ancillary data only gets sent once. That is, in non-blocking mode, if there is no room in the receiving socket, you will get back EAGAIN
/EWOULDBLOCK
and neither data nor ancillary data will be sent. But if there is some space on the receiving side, then the initial portion of the data will be sent and the ancillary data will also be sent. You would then receive a return byte count indicating a partial send but the ancillary data will have been sent.
You need to be aware of that when you attempt the send the rest of your message, because the kernel maintains no memory that you've previously sent a partial buffer with which the subsequent buffer is logically contiguous (really no way it could -- you could be sending entirely different data for all it knows). So if you simply provide the same ancillary data for subsequent buffer parts, I believe the kernel will happily deliver the ancillary data again with your subsequent buffer part(s). This might result in duplicate file descriptors on the receiver side (which you would probably neglect to close since you wouldn't be expecting them) -- if you don't avoid it.
Now if you're in blocking mode on the sending side, and the transfer gets broken into multiple parts, the ancillary data will only be sent once -- with the first buffer part, because the sending of the entire buffer remains within kernel control.
On the receiving side, you would therefore need to be aware that the ancillary data accompanies the first chunk of received data, if you haven't received the entire logical message.
I believe this behavior is consistent with that reported in the stackexchange reference given by @Klas-Lindbäck (https://unix.stackexchange.com/questions/185011/what-happens-with-unix-stream-ancillary-data-on-partial-reads). (That question didn't deal with non-blocking-mode though.)
This answer is specific to linux. So it's certainly possible that results would differ slightly on other OSes, though it's difficult for me to see how they could be significantly different and still maintain sane semantics. The kernel can't reasonably maintain a memory of what has been sent previously and the sendmsg
prototype doesn't allow it to overwrite the user's msghdr
to reflect that the msg_control
part has already been sent.
Related Topics
Max Thread Per Process in Linux
How to Get Ec2 Load Balancing Properly Set Up to Allow for Real Time File Syncing
Merge/Join Two Tables Fast Linux Command Line
Interrupt Handling and User Space Notification
How to Check a File Exists and Execute a Command If Not
How to Force Node.Js Require to Be Case Sensitive
What Does "$" Give Us Exactly in a Shell Script
How to Load a Specific Version of R in Linux
What Is the Best Tool to Convert Common Video Formats to Flv on a Linux Cli
Will Ctrl+C Send Sigint Signals to Both Parent and Child Processes in Linux
How to Check Whether the Processor Cache Has Been Flushed Recently
How to Use Aio and Epoll Together in a Single Event Loop
What Is Chained Irq in Linux, When Are They Need to Used
Switch from 32Bit Mode to 64 Bit (Long Mode) on 64Bit Linux
The Meaning of Real, User, and Sys in Output of Linux Time Command