TCMalloc 是Google开发的内存分配器。全称Thread-Caching Malloc，即线程缓存的malloc，实现了高效的多线程内存管理，用于替代系统的内存分配相关的函数（malloc、free，new，new等）。
TCMalloc 可以分为三部分： 前端(front-end)、中端(middle-end)、后端(back-end)。
注释：前端能运行在per-CPU 或者传统的per-thread 模式，后端能支持 hugepage aware pageheap 和 legacy pageheap。
* The front-end is a cache that provides fast allocation and deallocation of
memory to the application.
* The middle-end is responsible for refilling the front-end cache.
* The back-end handles fetching memory from the OS.
This cache is only accessible by a single thread at a time, so it does not require any locks, hence most allocations and deallocations are fast.
* Originally it supported per-thread caches of objects (hence the name Thread Caching Malloc). However, this resulted in memory footprints that scaled with the number of threads. Modern applications can have large thread counts, which result in either large amounts of aggregate per-thread memory,
or many threads having minuscule per-thread caches.
* More recently TCMalloc has supported per-CPU mode. In this mode each logical CPU in the system has its own cache from which to allocate memory. Note: On x86 a logical CPU is equivalent to a hyperthread.
* Small and Large Object Allocation
* Per-CPU Mode
* Restartable Sequences and Per-CPU TCMalloc
* Legacy Per-Thread mode
* Runtime Sizing of Front-end Caches
中端的任务是给前端提供内存，以及把内存归还给后端。中端由Transfer cache和Central free list 组成。
* Transfer Cache
The transfer cache gets its name from situations where one thread is allocating memory that is deallocated by another thread. The transfer cache allows memory to rapidly flow between two different threads.
If the transfer cache is unable to satisfy the memory request, or has insufficient space to hold the returned objects, it will access the central free list.
* Central Free List
The central free list manages memory in spans, a span is a collection of one or more pages of memory. These terms will be explained in the next couple of sections.
* Pagemap and Spans
TCMalloc管理将堆(heap)分成pages， 在编译时设定pages大小。连续的pages 组成 Span对象。pagemap 用于查找span，或者定义特定对象的size-class。
TCMalloc 采用2层(2-level) 或者3层(3-level)的radix tree 来映射spans定位的所有可能的内存。如下图Span A 拥有2pages, SpanB 拥有3pages。
* Storing Small Objects in Spans
一个span包含一个指针，该指针指向TCMalloc pages的base； 对于small object，这些pages 被分解为2**16个对象。
TCMalloc 能创建多种不同尺寸大小的 pages，TCMalloc page 大小有4KiB, 8KiB, 32KiB, and 256KiB.
Consequently, it makes sense for applications with small memory footprints, or that are sensitive to memory footprint size to use smaller TCMalloc page sizes. Applications with large memory footprints are likely to benefit from larger TCMalloc page sizes.
The back-end of TCMalloc has three jobs:
- It manages large chunks of unused memory.
- It is responsible for fetching memory from the OS when there is no suitably sized memory available to fulfill an allocation request.
- It is responsible for returning unneeded memory back to the OS.
There are two backends for TCMalloc:
- The Legacy pageheap which manages memory in TCMalloc page sized chunks.
- The hugepage aware pageheap which manages memory in chunks of hugepage sizes. Managing memory in hugepage chunks enables the allocator to improve application performance by reducing TLB misses.
* Legacy Pageheap
The legacy pageheap is an array of free lists for particular lengths of contiguous pages of available memory. For
k < 256, the
kth entry is a free list of runs that consist of
k TCMalloc pages. The
256th entry is a free list of runs that have length
>= 256 pages:
* Hugepage Aware Allocator
The objective of the hugepage aware allocator is to hold memory in hugepage size chunks. On x86 a hugepage is 2MiB in size. To do this the back-end has three different caches: