Understanding Tcpdump Filter & Bit-Masking

Understanding Tcpdump filter & bit-masking

It's not the BPF filter that gets http headers but the "-A" switch on your tcpdump command.

Your tcpdump command looks for tcp traffic to certain destination or from a certain source on eth0 where the final BPF filter involves a calculation that results in a non-zero total. With the "-A" option, it prints each packet in ASCII minus its link level header.

I've explained the calculation below but I believe there's some issues in the actual filter, possibly through copying and pasting. When you use these filters in tcpdump, you're using tcp bit-masking, which is typically used when examining fields that do not fall on byte boundaries

ip[2:2] refers to the two bytes (i.e. 3rd & 4th bytes) in the IP header, beginning at byte 2 (remember it starts at offset 0). This total represents the total length of the IP packet which can be a maximum of 65535 bytes.

For the bitmask here, for clarity, I've pre-pended a '0' so mask 0xf becomes 0x0f. The leading '0' on the mask is dropped as per the comment from GuyHarris below.

ip[0]&0x0f refers to the second half of byte 0 (i.e. the 1st byte) in the IP header, which will give you the IP header length in 32 bit words and as such, this is typically multiplied by 4 for such a calculation.
tcp[12]&0xf0) refers to the first half of byte 12 (i.e. the 11th byte), which is the data offset field, which specifies the size of the TCP header in 32-bit words and as such, this is typically multiplied by 4 for such a calculation.

You need to multiply the last 2 lengths by 4 because they are 32 bit/4 byte words and so need be translated to a total in bytes for the calculation to be correct

Your filter should be calculating:

The IP packet length (in bytes) - The IP header length - The TCP Header Length

and looking for that value to be zero, i.e. something like this

sudo tcpdump -A -nnpi eth0 '(ip[2:2] - ((ip[0]&0x0f)*4) - ((tcp[12]&0xf0)*4) != 0)'

When you perform the subtraction, you're looking for a non-zero total. This non-zero total means that there's data above layer 4, i.e. data in the tcp payload, typically application traffic.

You may also want to add port 80 assuming most http traffic is over port 80.

Such a filter is commonly used by security folk to detect data on a SYN, which is not normal but according to the RFCs, it is allowed. so the whole thing would look something like -

'tcp[13]=0x02 and (ip[2:2] - ((ip[0]&0x0f)*4) - ((tcp[12]&0xf0)*4) != 0)'

TCPIPGuide is a very good, free online guide on TCP/IP btw.

Updated: Modify the 'leading zero' section on the bitmask as per the update from Guy Harris.

TCPDUM Bit Masking

Your second one isn't working because you are masking off the low nibble of offset 12 and preserving the high nibble... which is correct.. but you aren't actually capturing its value.

Effectively, you have said this:

(tcp[12] & 0xf0 != 0)

That will produce a 1 or a zero as a true or a false. Next, you multiply that by 4... which will always work since the TCP header length will always be greater than zero... but it will now be looking for the "GE" letters at offset 4 in the TCP header... the start of the sequence number.

You can still use the 0xf0 mask, but you still need to divide it or shift it. For example:

 (tcp[12] & 0xf0 >> 2)

Notice that I am taking advantage of the shift to avoid having to multiply by 4... Multiplying by 4 is equivalent to shifting left 2 bits. Since I would normally shift the 12th byte offset 4 bits, I'm saving a step.

TCPDUMP: Bitmasking

This states to set all bits in the first byte of the IP packet header except for the first 4 bits (which is the version number) to 0

More correctly, it selects the first 4 bits of the first byte of the IP packet header, and returns a value in which the lower 4 bits are zero.

So you are correct, in that tcpdump IP[0] & 0xf0 = 4 will NEVER succeed (as IP[0] & 0xf0 is in the range 0x00 through 0xf0, with the low-order nibble being 0, so it can NEVER equal 4), and IP[0] & 0xf0 = 0x40 will succeed only if the IP version number in the IP header is 4 (rather than, for example, 6).

Difference between two similar tcpdump filters

With that syntax you can filter the packets bitwise.

For example, consider the first two bytes of an IP frame.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version|  IHL  |Type of Service|          Total Length         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Let's say you want to filter only ip packets with version equal to 4 (indicating IPv4 packets).

You can do something like this

tcpdump -i ethX 'ip[0:1] & 0xf0 = 0x40'

ip[0:1] means "extract 1 bytes from offset zero of the IP frame"
& 0xf0 filters out the IHL bits off the first byte
= 0x40 will match only if the version bits contains the number 4

et voilà, you built a custom filter digging deeply into the captured frames.

In the two cases you listed, i suppose there's a typo.

I think it should be:

proto[x:y] & z = n   : every bits are set to n when applying mask z to proto[x:y]
proto[x:y] = n       : p[x:y] has exactly the bits set to n

Could someone explain these code snippets?

If you look in the definition of got_packet, you'll see const u_char *packet. packet is a pointer to a char (or generally, to a location in memory).

In both cases, a pointer gets casted to a respective struct sniff_ethernet or struct sniff_tcp pointer, in the first case without manipulation (it accesses the packet from the start), in the second case by adding some offset, ie. the size of the ethernet header and the size of the ip packet. It accesses the tcp data in the packet.

Understanding Tcpdump Filter & Bit-Masking