Linux Slab Allocator and Cache Performance

Linux slab allocator and cache performance

I think I got it, the answer is related to Associativity.

A cache can be divided to certain sets, each set can only cache a certain memory blocks type in it. For example, set0 will contain memory blocks with addresses of multiple of 8, set1 will contain memory blocks with addresses of multiple of 12. The reason for that is to boost cache performance, to avoid the situation where every address is searched throught the whole cache. This way only a certain set of the cache needs to be searched.

Now, from the link Understanding CPU Caching and performance

From page 377 of Henessey and Patterson, the cache placement formula is as follows:
(Block address) MOD (Number of sets in cache)

Lets take memory block address 0x10000008 (from slabX with color C) and memory block address 0x20000009 (from slabY with color Z). For most N (number of sets in cache), the calculation for <address> MOD <N> will yield a different value, hence a different set to cache the data. If the addresses were with same least significant bits values (for example 0x10000008 and 0x20000008) then for most of N the calculation will yield same value, hence the blocks will collide to the same cache set.

So, by keeping an a different offset (colors) for the objects in different slabs, the slabs objects will potentially reach different sets in cache and will not collide to the same set, and overall cache performance is increased.

EDIT: Furthermore, if the cache is a direct mapped one, then according to wikipedia, CPU Cache, no cache replacement policy exist and the modulu calculation yields the cache block to which the memory block will be stored:

Direct-mapped cache
In this cache organization, each location in main memory can go in only one entry in the cache. Therefore, a direct-mapped cache can also be called a "one-way set associative" cache. It does not have a replacement policy as such, since there is no choice of which cache entry's contents to evict. This means that if two locations map to the same entry, they may continually knock each other out. Although simpler, a direct-mapped cache needs to be much larger than an associative one to give comparable performance, and it is more unpredictable. Let x be block number in cache, y be block number of memory, and nbe number of blocks in cache, then mapping is done with the help of the equation x = y mod n.

what's the relationship between memcached slab and linux kernel slab

memcached use its own implementation of a slab allocator

SLAB memory management

The slab allocator is an abstraction layer to make easier allocation of numerous objects of a same type.
The interface offers the function

struct kmem_cache * kmem_cache_create(const char *name,
    size_t size, size_t align, unsigned long flags,
    void (*ctor)(void*));

This function creates a new slab allocator that will be able to handle allocation of size-bytes long objects. If the creation is successful, you get your pointer to the related struct kmem_cache. This structures holds information about the slabs it manages.

As it is implied on the Wikipedia description, the purpose of such an extra layer is to prevent memory fragmentation issues that could happen if the memory allocation was made in a simple and intuitive manner. To do so, it introduces the notion of slab through the following data structure :

struct slab {
    struct list_head list; /* embedded list structure */
    unsigned long colouroff;
    void *s_mem; /* first object in the slab */
    unsigned int inuse; /* allocated objects in the slab */
    kmem_bufctl_t free; /* first free object (if any) */
};

Thus, the kmem_cache object holds 3 lists of its slabs, gathered in 3 flavours :

Empty slabs : these slabs do not contain an in-use object.
Partial slabs : these slabs contain objects currently used but there is still memory area that can hold new objects.
Full slabs : these slabs contains objects being used and cannot host new objects (full ...).

When requesting an object through the slab allocator, it will try to get the required memory area within a partial slab, if it cannot, it will get it from an empty slab.

If you are eager to collect more information about this, you should take a look at Robert Love's Linux Kernel Development

What to choose between Slab and Slub Allocator in Linux Kernel?

In the search of answer, I posted the same question on Quora and Robert Love answered it:

I'm assuming you are asking this from the point-of-view of the user of
a system, or perhaps someone building a kernel for a particular
product. As a kernel developer, you don't care what "slab" allocator
is in use; the API is the same.

First, "slab" has become a generic name referring to a memory
allocation strategy employing an object cache, enabling efficient
allocation and deallocation of kernel objects. It was first documented
by Sun engineer Jeff Bonwick1 and implemented in the Solaris 2.4
kernel.

Linux currently offers three choices for its "slab" allocator:

Slab is the original, based on Bonwick's seminal paper and available
since Linux kernel version 2.2. It is a faithful implementation of
Bonwick's proposal, augmented by the multiprocessor changes described
in Bonwick's follow-up paper2.

Slub is the next-generation replacement memory allocator, which has
been the default in the Linux kernel since 2.6.23. It continues to
employ the basic "slab" model, but fixes several deficiencies in
Slab's design, particularly around systems with large numbers of
processors. Slub is simpler than Slab.

SLOB (Simple List Of Blocks) is a memory allocator optimized for
embedded systems with very little memory—on the order of megabytes. It
applies a very simple first-fit algorithm on a list of blocks, not
unlike the old K&R-style heap allocator. In eliminating nearly all of
the overhad from the memory allocator, SLOB is a good fit for systems
under extreme memory constraints, but it offers none of the benefits
described in 1 and can suffer from pathological fragmentation.

What should you use? Slub, unless you are building a kernel for an
embedded device with limited in memory. In that case, I would
benchmark Slub versus SLOB and see what works best for your workload.
There is no reason to use Slab; it will likely be removed from future
Linux kernel releases.

Please refer to this link for an original response.

Why Linux kernel memory subsystem is named slab

A slab, in common usage, refers to a big flat block of solid material. By analogy, a slab allocator manages large, contiguous chunks of memory, dividing them into smaller pieces for allocation.

Linux Slab Allocator and Cache Performance