SLQB slab allocator (try 2)

Introducing the SLQB slab allocator. SLQB takes code and ideas from all other slab allocators in the tree. The primary method for keeping lists of free objects within the allocator is a singly-linked list, storing a pointer within the object memory itself (or a small additional space in the case of RCU destroyed slabs). This is like SLOB and SLUB, and opposed to SLAB, which uses arrays of objects, and metadata. This reduces memory consumption and makes smaller sized objects more realistic as there is less overhead. Using lists rather than arrays can reduce the cacheline footprint. When moving objects around, SLQB can move a list of objects from one CPU to another by simply manipulating a head pointer, wheras SLAB needs to memcpy arrays. Some SLAB per-CPU arrays can be up to 1K in size, which is a lot of cachelines that can be touched during alloc/free. Newly freed objects tend to be cache hot, and newly allocated ones tend to soon be touched anyway, so often there is little cost to using metadata in the objects. SLQB has a per-CPU LIFO freelist of objects like SLAB (but using lists rather than arrays). Freed objects are returned to this freelist if they belong to the node which our CPU belongs to. So objects allocated on one CPU can be added to the freelist of another CPU on the same node. When LIFO freelists need to be refilled or trimmed, SLQB takes or returns objects from a list of slabs. SLQB has per-CPU lists of slabs (which use struct page as their metadata including list head for this list). Each slab contains a singly-linked list of objects that are free in that slab (free, and not on a LIFO freelist). Slabs are freed as soon as all their objects are freed, and only allocated when there are no slabs remaining. They are taken off this slab list when if there are no free objects left. So the slab lists always only contain "partial" slabs; those slabs which are not completely full and not completely empty. SLQB slabs can be manipulated with no locking unlike other allocators which tend to use per-node locks. As the number of threads per socket increases, this should help improve the scalability of slab operations. Freeing objects to remote slab lists first batches up the objects on the freeing CPU, then moves them over at once to a list on the allocating CPU. The allocating CPU will then notice those objects and pull them onto the end of its freelist. This remote freeing scheme is designed to minimise the number of cross CPU cachelines touched, short of going to a "crossbar" arrangement like SLAB has. SLAB has "crossbars" of arrays of objects. That is, NR_CPUS*MAX_NUMNODES type arrays, which can become very bloated in huge systems (this could be hundreds of GBs for kmem caches for 4096 CPU, 1024 nodes systems). SLQB also has similar freelist, slablist structures per-node, which are protected by a lock, and usable by any CPU in order to do node specific allocations. These allocations tend not to be too frequent (short lived allocations should be node local, long lived allocations should not be too frequent). There is a good overview and illustration of the design here: http://lwn.net/Articles/311502/ By using LIFO freelists like SLAB, SLQB tries to be very page-size agnostic. It tries very hard to use order-0 pages. This is good for both page allocator fragmentation, and slab fragmentation. SLQB initialistaion code attempts to be as simple and un-clever as possible. There are no multiple phases where different things come up. There is no weird self bootstrapping stuff. It just statically allocates the structures required to create the slabs that allocate other slab structures. SLQB uses much of the debugging infrastructure, and fine-grained sysfs statistics from SLUB. There is also a Documentation/vm/slqbinfo.c, derived from slabinfo.c, which can query the sysfs data. Documentation/vm/slqbinfo.c | 1054 +++++++++++++ arch/x86/include/asm/page.h | 1 include/linux/mm.h | 4 include/linux/rcu_types.h | 18 include/linux/rcupdate.h | 11 include/linux/slab.h | 10 include/linux/slqb_def.h | 295 +++ init/Kconfig | 9 lib/Kconfig.debug | 20 mm/Makefile | 1 mm/slqb.c | 3562 ++++++++++++++++++++++++++++++++++++++++++++ 11 files changed, 4971 insertions(+), 14 deletions(-) Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
author: Nick Piggin <npiggin@suse.de> 2009-01-23 17:21:39 +0100
committer: Pekka Enberg <penberg@cs.helsinki.fi> 2009-01-26 10:34:20 +0200
commit: 32efb35b328d0e87fa5358239c54c889226cc6e7 (patch)
tree: 2d9efcc6c8789b1722216344e8f2a509c1c9725c /init
parent: a6525042bfdfcab128bd91fad264de10fd24a55e (diff)
1 files changed, 7 insertions, 2 deletions
diff --git a/init/Kconfig b/init/Kconfig
index a724a149bf3f..3c9a1ca3b3f9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -807,7 +807,7 @@ config SLUB_DEBUG
 
 choice
 	prompt "Choose SLAB allocator"
-	default SLUB
+	default SLQB
 	help
 	   This option allows to select a slab allocator.
 
@@ -828,6 +828,11 @@ config SLUB
 	   and has enhanced diagnostics. SLUB is the default choice for
 	   a slab allocator.
 
+config SLQB
+	bool "SLQB (Queued allocator)"
+	help
+	  SLQB is a proposed new slab allocator.
+
 config SLOB
 	depends on EMBEDDED
 	bool "SLOB (Simple Allocator)"
@@ -869,7 +874,7 @@ config HAVE_GENERIC_DMA_COHERENT
 config SLABINFO
 	bool
 	depends on PROC_FS
-	depends on SLAB || SLUB_DEBUG
+	depends on SLAB || SLUB_DEBUG || SLQB
 	default y
 
 config RT_MUTEXES
author	Nick Piggin <npiggin@suse.de>	2009-01-23 17:21:39 +0100
committer	Pekka Enberg <penberg@cs.helsinki.fi>	2009-01-26 10:34:20 +0200
commit	32efb35b328d0e87fa5358239c54c889226cc6e7 (patch)
tree	2d9efcc6c8789b1722216344e8f2a509c1c9725c /init
parent	a6525042bfdfcab128bd91fad264de10fd24a55e (diff)