What's the Difference Between Sockaddr, Sockaddr_In, and Sockaddr_In6

What's the difference between sockaddr, sockaddr_in, and sockaddr_in6?

In order to give more information other people may find useful, I have decided to answer my question although I initially did not intend to.

After some digging into the linux source code I have found the following :
There are multiple protocols and they all implement getsockname. And each one has an underlying address data structure. For example, IPv4 has sockaddr_in, IPV6 has sockaddr_in6, the AF_UNIX socket has sockaddr_un.
sockaddr is used as the common data struct in the signature of the linux networking

That API will copy the the socketaddr_inor sockaddr_in6 or sockaddr_un to a sockaddr base on another parameter length by memcpy.

And all those data structures begin with same type field sa_family.

Because of all this, the code snippet is valid, because both sockaddr_in and sockaddr_in6 have a sa_family field and then can be cast into the correct data structure to be used after a check on that sa_family field.

BTW, I'm not sure why the sizeof(sockaddr_in6) > sizeof(sockaddr), which cause allocate memory based on size of sockaddr is not enough for ipv6 (that is error-prone), but I guess it is because of history reason.

Casting between sockaddr and sockaddr_in6

ai_family and ai_addr are fields of the addrinfo struct, so presumably the code you are quoting had called getaddrinfo() beforehand.

The result of getaddrinfo() is a NULL-terminated linked list of addrinfo structs, where the addrinfo::ai_addr field is a pointer to an allocated memory block that is of sufficient size to hold a socket address of the reported addrinfo::ai_family type. The size of the address is reported in the addrinfo::ai_addrlen field.

For AF_INET, the addrinfo::ai_addr field is pointing at a memory block containing a sockaddr_in struct.

For AF_INET6, the addrinfo::ai_addr field is pointing at a memory block containing a sockaddr_in6 struct.

That is why the type-casts work.

The addrinfo::ai_addr field is declared as struct sockaddr* so it can be passed as-is to the addr parameter of the bind() and connect() functions without type-casting. The addrinfo::ai_addrlen field can be passed as-is to their addrlen parameter.

What's the pad to sockaddr_in for?

There are multiple protocol families. Each family has its own address structure.

Example: AF_INET uses sockaddr_in, AF_INET6 uses sockaddr_in6, AF_UNIX uses sockaddr_un, etc. But sockaddr is the base structure. All these structures must be type-cast to sockaddr while binding/connecting a socket.

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

Let's look at the structures of sockaddr_in and sockaddr:

struct sockaddr_in {
    short            sin_family;   /* Protocol family (always AF_INET) */
    unsigned short   sin_port;     /* Port number in network byte order */
    struct in_addr   sin_addr;     /* IP address in network byte order */
    unsigned char    sin_zero[8];  /* Pad to sizeof(struct sockaddr) */
};

struct in_addr {
   uint32_t s_addr; /* Address in network byte order (big-endian) */
};

The structure of sockaddr is:

struct sockaddr 
{
    sa_family_t sa_family;
    char        sa_data[14];
}

Look at the sizes of the elements in the two structures sockaddr_in and sockaddr.

The first element in both structures is the same and occupies the same memory.

sin_port --> 2 bytes

sin_addr --> 4 bytes

sin_zero[8] --> 8 bytes

Total = 14 bytes (equal to size of sa_data[14])

We add padding bytes to make their structure sizes equal.

reference:

https://man7.org/linux/man-pages/man2/bind.2.html

https://man7.org/linux/man-pages/man2/connect.2.html

Why do we cast sockaddr_in to sockaddr when calling bind()?

No, it's not just convention.

sockaddr is a generic descriptor for any kind of socket operation, whereas sockaddr_in is a struct specific to IP-based communication (IIRC, "in" stands for "InterNet"). As far as I know, this is a kind of "polymorphism" : the bind() function pretends to take a struct sockaddr *, but in fact, it will assume that the appropriate type of structure is passed in; i. e. one that corresponds to the type of socket you give it as the first argument.

How sockaddr holds sockaddr_storage or sockaddr_in6?

Remember that all functions that take a struct sockaddr pointer, also takes the size of the structure. Together with the meta-data on the actual socket, it's easy for the system to know what kind of structure you're passing.

Also note that it's always pointers to the address structures being passed around, not actual structures which would not work. So you never to e.g.

(struct sockaddr) a_in6_sockaddr

you do

(struct sockaddr *) &a_in6_sockaddr

What is the correct way to convert a struct sockaddr * to struct sockaddr_in6 * with valid C code?

So if the way we do socket programming (and what is also recommended by the books) is a hack, what is the correct way to rewrite the above code so that it is also a valid C code as per the C standard?

TL;DR: continue to do what you present in your example.

The code you presented appears to be syntactically correct. It may or may not exhibit undefined behavior under some circumstances. Whether or not it does depends on the behavior of getaddrinfo().

There is no way to do this in C that meets all the functional requirements and is any better protected against undefined behavior than the standard technique you've presented. That's why it's the standard technique. The issue here is that the function must support all conceivable address types, including types that have not yet been defined. It could declare the socket address pointer as a void *, which would not require casting, but that wouldn't actually change anything about whether any given program exhibits undefined behavior.

For its part, getaddrinfo() is designed with exactly such usage in mind, so it is its problem if using the expected cast on the result allows for misbehavior. Moreover, getaddrinfo() is not part of the C standard library -- it is standardized (only) by POSIX, which also incorporates the C standard. Analyzing that function in the light of C alone therefore demonstrates an inappropriate hyperfocus. Though the casts raise some concern in light of C alone, you should expect that in the context of getaddrinfo() and other POSIX networking functions using struct sockaddr *, casting to the correct specific address type and accessing the referenced object produces reliable results.

Additionally, I think AnT's answer to your other question is oversimplified and overly negative. I'm considering whether to write a contrasting answer.

Comparing IPV4 socket(sockaddr_in) with IPV6 Socket(sockaddr_in6)

As Joachim Pileborg reasoned, you don't need to care about this when the IPv4 address comes from an earlier packet received on the same socket because you will be comparing one mapped IPv4 address to another. It is only in the case that the IPv4 address was obtained from an external source that you have to care.

As João Augusto pointed out, you neglected to check that the IPv6 address indeed is an IPv4 mapped address before comparing the last 32 bits. There is a macro IN6_IS_ADDR_V4MAPPED that will help you do this:

if (
    IN6_IS_ADDR_V4MAPPED(&(ipv6_clientdata->sin6_addr)) &&
    (ipv6_clientdata->sin6_port == ipv4_storeddata->sin_port) &&
    (ipv6_clientdata->sin6_addr.in6_u.u6_addr32[3] == ipv4_storeddata->sin_addr.s_addr)
) {
    addrfound = true;
}

What is the difference between struct addrinfo and struct sockaddr

struct addrinfo is returned by getaddrinfo(), and contains, on success, a linked list of such structs for a specified hostname and/or service.

The ai_addr member isn't actually a struct sockaddr, because that struct is merely a generic one that contains common members for all the others, and is used in order to determine what type of struct you actually have. Depending upon what you pass to getaddrinfo(), and what that function found out, ai_addr might actually be a pointer to struct sockaddr_in, or struct sockaddr_in6, or whatever else, depending upon what is appropriate for that particular address entry. This is one good reason why they're kept "separate", because that member might point to one of a bunch of different types of structs, which it couldn't do if you tried to hardcode all the members into struct addrinfo, because those different structs have different members.

This is probably the easiest way to get this information if you have a hostname, but it's not the only way. For an IPv4 connection, you can just populate a struct sockaddr_in structure yourself, if you want to and you have the data to do so, and avoid going through the rigamarole of calling getaddrinfo(), which you might have to wait for if it needs to go out into the internet to collect the information for you. You don't have to use struct addrinfo at all.

What's the Difference Between Sockaddr, Sockaddr_In, and Sockaddr_In6