What Is The Correct Way to Define a Netfilter Hook Function

What is the correct way to define a Netfilter hook function?

In the end I just made an entire module dedicated to this:

/**
 * The kernel API is far from static. In particular, the Netfilter packet entry
 * function keeps changing. nf_hook.c, the file where we declare our packet
 * entry function, has been quite difficult to read for a while now. It's pretty
 * amusing, because we don't even use any of the noisy arguments.
 *
 * This file declares a usable function header that abstracts away all those
 * useless arguments.
 */

#include <linux/version.h>

/* If this is a Red Hat-based kernel (Red Hat, CentOS, Fedora, etc)... */
#ifdef RHEL_RELEASE_CODE

#if RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 2)
#define NF_CALLBACK(name, skb) unsigned int name( \
        const struct nf_hook_ops *ops, \
        struct sk_buff *skb, \
        const struct net_device *in, \
        const struct net_device *out, \
        const struct nf_hook_state *state) \

#elif RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7, 0)
#define NF_CALLBACK(name, skb) unsigned int name( \
        const struct nf_hook_ops *ops, \
        struct sk_buff *skb, \
        const struct net_device *in, \
        const struct net_device *out, \
        int (*okfn)(struct sk_buff *))

#else

/*
 * Sorry, I don't have headers for RHEL 6 and below because I'm in a bit of a
 * deadline right now.
 * If this is causing you trouble, find `nf_hookfn` in your kernel headers
 * (typically in include/linux/netfilter.h) and add your version of the
 * NF_CALLBACK macro here.
 * Also, kernel headers per version can be found here: http://vault.centos.org/
 */
#error "Sorry; this version of RHEL is not supported because it's kind of old."

#endif /* RHEL_RELEASE_CODE >= x */

/* If this NOT a RedHat-based kernel (Ubuntu, Debian, SuSE, etc)... */
#else

#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 4, 0)
#define NF_CALLBACK(name, skb) unsigned int name( \
        void *priv, \
        struct sk_buff *skb, \
        const struct nf_hook_state *state)

#elif LINUX_VERSION_CODE >= KERNEL_VERSION(4, 1, 0)
#define NF_CALLBACK(name, skb) unsigned int name( \
        const struct nf_hook_ops *ops, \
        struct sk_buff *skb, \
        const struct nf_hook_state *state)

#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 13, 0)
#define NF_CALLBACK(name, skb) unsigned int name( \
        const struct nf_hook_ops *ops, \
        struct sk_buff *skb, \
        const struct net_device *in, \
        const struct net_device *out, \
        int (*okfn)(struct sk_buff *))

#elif LINUX_VERSION_CODE >= KERNEL_VERSION(3, 0, 0)
#define NF_CALLBACK(name, skb) unsigned int name( \
        unsigned int hooknum, \
        struct sk_buff *skb, \
        const struct net_device *in, \
        const struct net_device *out, \
        int (*okfn)(struct sk_buff *))

#else
#error "Linux < 3.0 isn't supported at all."

#endif /* LINUX_VERSION_CODE > n */

#endif /* RHEL or not RHEL */

So instead of this:

static unsigned int function_name((...), struct sk_buff *skb, (...))
{
    return do_something_with_skb(skb);
}

Do this:

static NF_CALLBACK(function_name, skb)
{
    return do_something_with_skb(skb);
}

Netfilter hook registration with networking sub system

You are correct. For each protocol family, there is indeed a list of hooks, which are actually set by the PF itself (eg. NFPROTO_BRIDGE has a BROUTE hooklist, but neither IPv4 or IPv6 does).

When a packet comes in to a logical network interface (ethernet bridge, ethernet interface, etc), it will get passed around the stack. If it is an IPv4 packet, it eventually ip_rcv() will get called. This will call the NF_INET_PRE_ROUTING hooks before continuing on to the packet routing proper. Similarly, ip_output calls the NF_INET_POST_ROUTING hooks before actually sending the packet on its way.

Putting the Netfilter hooks into the main networking code allows the network interface drivers themselves to be blissfully ignorant of the whole process.

To get a better idea of how this all flows, check out http://lxr.free-electrons.com/source/net/ipv4/ip_input.c and http://lxr.free-electrons.com/source/net/ipv4/ip_output.c. You'll see the NF_HOOK and NF_HOOK_COND macros being called when packets transition to different layers, etc.

Netfilter hook stateful connection packet filtering

Seems that in your hook you want to make a decision on packet based on conntrack(CT) info about the connection state - to block (drop) all the TCP packets which are in the middle of connection, i.e. packets both without SYN flag and without connection entry in CT.

So if you want to reap the benefits of CT, you have to let him work a bit.

Now your hook is in NF_INET_PRE_ROUTING with NF_IP_PRI_FIRST priority. Just look at the picture of Linux kernel packet flow. If we talk about pre-routing chain CT-handling is somewhere after RAW table (i.e. with a lower priority).

The list of priorities you can see here:

enum nf_ip_hook_priorities {
    NF_IP_PRI_FIRST = INT_MIN,
    NF_IP_PRI_CONNTRACK_DEFRAG = -400,
    NF_IP_PRI_RAW = -300,
    NF_IP_PRI_SELINUX_FIRST = -225,
    NF_IP_PRI_CONNTRACK = -200,
    NF_IP_PRI_MANGLE = -150,
    NF_IP_PRI_NAT_DST = -100,
    NF_IP_PRI_FILTER = 0,
    NF_IP_PRI_SECURITY = 50,
    NF_IP_PRI_NAT_SRC = 100,
    NF_IP_PRI_SELINUX_LAST = 225,
    NF_IP_PRI_CONNTRACK_HELPER = 300,
    NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
    NF_IP_PRI_LAST = INT_MAX,
};

Thus to stick in after CT (after nf_conntrack_in()) you must register your hook with priority lower than NF_IP_PRI_CONNTRACK (i.e. with greater number, e.g. -50).

So you do:

static struct nf_hook_ops hooks[] __read_mostly = {
     {    
          .hook = hfunc,
          .pf = PF_INET,
          .hooknum = NF_INET_PRE_ROUTING,
          .priority = NF_IP_PRI_CONNTRACK + 150
     },
     // ...
};
// ...
int ret;
ret = nf_register_hooks(hooks, ARRAY_SIZE(hooks));
if (ret < 0)
    // error

Then you should access the CT info from within your hook:

static unsigned int hfunc(void *priv, struct sk_buff *skb,
                          const struct nf_hook_state *state) {
    struct iphdr *iph;

    iph = ip_hdr(skb);
    if (iph->protocol == IPPROTO_TCP) {
        struct nf_conn *ct;
        enum ip_conntrack_info ctinfo;
        struct tcphdr *tcph;

        ct = nf_ct_get(skb, &ctinfo);
        if (!ct)
            return NF_ACCEPT;

        tcph = tcp_hdr(skb)
        if (tcph->syn) { // && !tcph->ack ???
            if (ctinfo == IP_CT_NEW)
                return NF_ACCEPT;
        } else {
            if (ctinfo == IP_CT_NEW)
                return NF_DROP;
        }      
    }
    return NF_ACCEPT
}

Also remember that CT must be involved in your Linux kernel network processing. There should be CT modules inserted into kernel and an appropriate iptables rule added.

Can multiple kernel modules use the same netfilter hook without affecting each other?

As you know, the hooks are just places in the TCP/IP stack that you can insert some functions to do whatever with the skbs. Each function usually return one of the following (see include/uapi/linux/netfilter.h)

NF_DROP ----- This is the end of this skb. Drop this skb and do not pass it to rest of hooks (and of course higher layers).
NF_ACCEPT -- I am done with this skb, forward the skb to the next hook
NF_STOLEN -- I hijacked this skb (the module queued the skb for later processing)

IPtables uses these hooks to implement the required firewall rules. You can of course exist with IPtables (and any other hooks), but if for some reason your function is called before IPtables hooks and returns NF_DROP, the skb will not be forwarded to IPtables. On the other hand, if you always return NF_ACCEPT, then IPtables and other hooks in the system will not be affected at all.

As for the order of the hooks, the following priorities are used when the netfilter system traverses the hooks (from include/uapi/linux/netfilter_ipv4.h):

enum nf_ip_hook_priorities {
    NF_IP_PRI_FIRST = INT_MIN,
    NF_IP_PRI_CONNTRACK_DEFRAG = -400,
    NF_IP_PRI_RAW = -300,
    NF_IP_PRI_SELINUX_FIRST = -225,
    NF_IP_PRI_CONNTRACK = -200,
    NF_IP_PRI_MANGLE = -150,
    NF_IP_PRI_NAT_DST = -100, 
    NF_IP_PRI_FILTER = 0,
    NF_IP_PRI_SECURITY = 50,
    NF_IP_PRI_NAT_SRC = 100,
    NF_IP_PRI_SELINUX_LAST = 225,
    NF_IP_PRI_CONNTRACK_HELPER = 300,
    NF_IP_PRI_CONNTRACK_CONFIRM = INT_MAX,
    NF_IP_PRI_LAST = INT_MAX,};

This means that IPtables mangle table hooks will be executed before FILTER hooks. You can use any of these values or your own when you register with nf_register_hooks().

netfilter hook is not retrieving complete packet

It appears that sometimes you are getting a linear skb, and sometimes your skb is not linear. In the latter case you are not reading the full data contents of an skb.

If skb->data_len is zero, then your skb is linear and the full data contents of the skb is in skb->data. If skb->data_len is not zero, then your skb is not linear, and skb->data contains just the the first (linear) part of the data. The length of this area is skb->len - skb->data_len. skb_headlen() helper function calculates that for convenience. skb_is_nonlinear() helper function tells in an skb is linear or not.

The rest of the data can be in paged fragments, and in skb fragments, in this order.

skb_shinfo(skb)->nr_frags tells the number of paged fragments. Each paged fragment is described by a data structure in the array of structures skb_shinfo(skb)->frags[0..skb_shinfo(skb)->nr_frags]. skb_frag_size() and skb_frag_address() helper functions help dealing with this data. They accept the address of the structure that describes a paged fragment. There are other useful helper functions depending on your kernel version.

If the total size of data in paged fragments is less than skb->data_len, then the rest of the data is in skb fragments. It's the list of skb which is attached to this skb at skb_shinfo(skb)->frag_list (see skb_walk_frags() in the kernel).

Please note that there may be that there's no data in the linear part and/or there's no data in the paged fragments. You just need to process data piece by piece in the order just described.

What is the difference between NF_DROP and NF_STOLEN in Netfilter hooks?

This document gives a thorough overview of how netfilter works and why.

My understanding is that returning NF_DROP tells netfilter to drop the packet, whereas returning NF_STOLEN basically means that you're assuming responsibility for the packet from now on: the kernel still has the packet in its internal tables, and you're now responsible for telling the kernel to clean that up after you've done whatever else you're doing with the packet.

For most applications, you'll want to use NF_DROP rather than NF_STOLEN.

What Is The Correct Way to Define a Netfilter Hook Function