
This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
234 lines
7.5 KiB
C
234 lines
7.5 KiB
C
/*
|
|
* Written by Mark Hemment, 1996 (markhe@nextd.demon.co.uk).
|
|
*
|
|
* (C) SGI 2006, Christoph Lameter <clameter@sgi.com>
|
|
* Cleaned up and restructured to ease the addition of alternative
|
|
* implementations of SLAB allocators.
|
|
*/
|
|
|
|
#ifndef _LINUX_SLAB_H
|
|
#define _LINUX_SLAB_H
|
|
|
|
#ifdef __KERNEL__
|
|
|
|
#include <linux/gfp.h>
|
|
#include <linux/types.h>
|
|
|
|
typedef struct kmem_cache kmem_cache_t __deprecated;
|
|
|
|
/*
|
|
* Flags to pass to kmem_cache_create().
|
|
* The ones marked DEBUG are only valid if CONFIG_SLAB_DEBUG is set.
|
|
*/
|
|
#define SLAB_DEBUG_FREE 0x00000100UL /* DEBUG: Perform (expensive) checks on free */
|
|
#define SLAB_DEBUG_INITIAL 0x00000200UL /* DEBUG: Call constructor (as verifier) */
|
|
#define SLAB_RED_ZONE 0x00000400UL /* DEBUG: Red zone objs in a cache */
|
|
#define SLAB_POISON 0x00000800UL /* DEBUG: Poison objects */
|
|
#define SLAB_HWCACHE_ALIGN 0x00002000UL /* Align objs on cache lines */
|
|
#define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
|
|
#define SLAB_MUST_HWCACHE_ALIGN 0x00008000UL /* Force alignment even if debuggin is active */
|
|
#define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
|
|
#define SLAB_RECLAIM_ACCOUNT 0x00020000UL /* Objects are reclaimable */
|
|
#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
|
|
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
|
|
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
|
|
#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
|
|
|
|
/* Flags passed to a constructor functions */
|
|
#define SLAB_CTOR_CONSTRUCTOR 0x001UL /* If not set, then deconstructor */
|
|
#define SLAB_CTOR_ATOMIC 0x002UL /* Tell constructor it can't sleep */
|
|
#define SLAB_CTOR_VERIFY 0x004UL /* Tell constructor it's a verify call */
|
|
|
|
/*
|
|
* struct kmem_cache related prototypes
|
|
*/
|
|
void __init kmem_cache_init(void);
|
|
int slab_is_available(void);
|
|
|
|
struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
|
|
unsigned long,
|
|
void (*)(void *, struct kmem_cache *, unsigned long),
|
|
void (*)(void *, struct kmem_cache *, unsigned long));
|
|
void kmem_cache_destroy(struct kmem_cache *);
|
|
int kmem_cache_shrink(struct kmem_cache *);
|
|
void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
|
|
void *kmem_cache_zalloc(struct kmem_cache *, gfp_t);
|
|
void kmem_cache_free(struct kmem_cache *, void *);
|
|
unsigned int kmem_cache_size(struct kmem_cache *);
|
|
const char *kmem_cache_name(struct kmem_cache *);
|
|
int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
|
|
|
|
#ifdef CONFIG_NUMA
|
|
extern void *kmem_cache_alloc_node(struct kmem_cache *, gfp_t flags, int node);
|
|
#else
|
|
static inline void *kmem_cache_alloc_node(struct kmem_cache *cachep,
|
|
gfp_t flags, int node)
|
|
{
|
|
return kmem_cache_alloc(cachep, flags);
|
|
}
|
|
#endif
|
|
|
|
/*
|
|
* Common kmalloc functions provided by all allocators
|
|
*/
|
|
void *__kmalloc(size_t, gfp_t);
|
|
void *__kzalloc(size_t, gfp_t);
|
|
void * __must_check krealloc(const void *, size_t, gfp_t);
|
|
void kfree(const void *);
|
|
size_t ksize(const void *);
|
|
|
|
/**
|
|
* kcalloc - allocate memory for an array. The memory is set to zero.
|
|
* @n: number of elements.
|
|
* @size: element size.
|
|
* @flags: the type of memory to allocate.
|
|
*/
|
|
static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
|
|
{
|
|
if (n != 0 && size > ULONG_MAX / n)
|
|
return NULL;
|
|
return __kzalloc(n * size, flags);
|
|
}
|
|
|
|
/*
|
|
* Allocator specific definitions. These are mainly used to establish optimized
|
|
* ways to convert kmalloc() calls to kmem_cache_alloc() invocations by selecting
|
|
* the appropriate general cache at compile time.
|
|
*/
|
|
|
|
#if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
|
|
#ifdef CONFIG_SLUB
|
|
#include <linux/slub_def.h>
|
|
#else
|
|
#include <linux/slab_def.h>
|
|
#endif /* !CONFIG_SLUB */
|
|
#else
|
|
|
|
/*
|
|
* Fallback definitions for an allocator not wanting to provide
|
|
* its own optimized kmalloc definitions (like SLOB).
|
|
*/
|
|
|
|
/**
|
|
* kmalloc - allocate memory
|
|
* @size: how many bytes of memory are required.
|
|
* @flags: the type of memory to allocate.
|
|
*
|
|
* kmalloc is the normal method of allocating memory
|
|
* in the kernel.
|
|
*
|
|
* The @flags argument may be one of:
|
|
*
|
|
* %GFP_USER - Allocate memory on behalf of user. May sleep.
|
|
*
|
|
* %GFP_KERNEL - Allocate normal kernel ram. May sleep.
|
|
*
|
|
* %GFP_ATOMIC - Allocation will not sleep.
|
|
* For example, use this inside interrupt handlers.
|
|
*
|
|
* %GFP_HIGHUSER - Allocate pages from high memory.
|
|
*
|
|
* %GFP_NOIO - Do not do any I/O at all while trying to get memory.
|
|
*
|
|
* %GFP_NOFS - Do not make any fs calls while trying to get memory.
|
|
*
|
|
* Also it is possible to set different flags by OR'ing
|
|
* in one or more of the following additional @flags:
|
|
*
|
|
* %__GFP_COLD - Request cache-cold pages instead of
|
|
* trying to return cache-warm pages.
|
|
*
|
|
* %__GFP_DMA - Request memory from the DMA-capable zone.
|
|
*
|
|
* %__GFP_HIGH - This allocation has high priority and may use emergency pools.
|
|
*
|
|
* %__GFP_HIGHMEM - Allocated memory may be from highmem.
|
|
*
|
|
* %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
|
|
* (think twice before using).
|
|
*
|
|
* %__GFP_NORETRY - If memory is not immediately available,
|
|
* then give up at once.
|
|
*
|
|
* %__GFP_NOWARN - If allocation fails, don't issue any warnings.
|
|
*
|
|
* %__GFP_REPEAT - If allocation fails initially, try once more before failing.
|
|
*/
|
|
static inline void *kmalloc(size_t size, gfp_t flags)
|
|
{
|
|
return __kmalloc(size, flags);
|
|
}
|
|
|
|
/**
|
|
* kzalloc - allocate memory. The memory is set to zero.
|
|
* @size: how many bytes of memory are required.
|
|
* @flags: the type of memory to allocate (see kmalloc).
|
|
*/
|
|
static inline void *kzalloc(size_t size, gfp_t flags)
|
|
{
|
|
return __kzalloc(size, flags);
|
|
}
|
|
#endif
|
|
|
|
#ifndef CONFIG_NUMA
|
|
static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
|
|
{
|
|
return kmalloc(size, flags);
|
|
}
|
|
|
|
static inline void *__kmalloc_node(size_t size, gfp_t flags, int node)
|
|
{
|
|
return __kmalloc(size, flags);
|
|
}
|
|
#endif /* !CONFIG_NUMA */
|
|
|
|
/*
|
|
* kmalloc_track_caller is a special version of kmalloc that records the
|
|
* calling function of the routine calling it for slab leak tracking instead
|
|
* of just the calling function (confusing, eh?).
|
|
* It's useful when the call to kmalloc comes from a widely-used standard
|
|
* allocator where we care about the real place the memory allocation
|
|
* request comes from.
|
|
*/
|
|
#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
|
|
extern void *__kmalloc_track_caller(size_t, gfp_t, void*);
|
|
#define kmalloc_track_caller(size, flags) \
|
|
__kmalloc_track_caller(size, flags, __builtin_return_address(0))
|
|
#else
|
|
#define kmalloc_track_caller(size, flags) \
|
|
__kmalloc(size, flags)
|
|
#endif /* DEBUG_SLAB */
|
|
|
|
#ifdef CONFIG_NUMA
|
|
/*
|
|
* kmalloc_node_track_caller is a special version of kmalloc_node that
|
|
* records the calling function of the routine calling it for slab leak
|
|
* tracking instead of just the calling function (confusing, eh?).
|
|
* It's useful when the call to kmalloc_node comes from a widely-used
|
|
* standard allocator where we care about the real place the memory
|
|
* allocation request comes from.
|
|
*/
|
|
#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
|
|
extern void *__kmalloc_node_track_caller(size_t, gfp_t, int, void *);
|
|
#define kmalloc_node_track_caller(size, flags, node) \
|
|
__kmalloc_node_track_caller(size, flags, node, \
|
|
__builtin_return_address(0))
|
|
#else
|
|
#define kmalloc_node_track_caller(size, flags, node) \
|
|
__kmalloc_node(size, flags, node)
|
|
#endif
|
|
|
|
#else /* CONFIG_NUMA */
|
|
|
|
#define kmalloc_node_track_caller(size, flags, node) \
|
|
kmalloc_track_caller(size, flags)
|
|
|
|
#endif /* DEBUG_SLAB */
|
|
|
|
extern const struct seq_operations slabinfo_op;
|
|
ssize_t slabinfo_write(struct file *, const char __user *, size_t, loff_t *);
|
|
|
|
#endif /* __KERNEL__ */
|
|
#endif /* _LINUX_SLAB_H */
|
|
|