Understanding Memory Metrics: RSS, VSZ, USS, PSS, and Working Sets
A comprehensive guide to memory metrics on Linux: understand RSS, virtual memory, page cache, working sets, and why your numbers don't match. Learn which metric matters for debugging memory issues.
- categories
- Systems Debugging Performance
- published
You check your system memory:
| |
Then you check your application:
| |
And the process itself:
| |
Three different tools. Three different numbers. Your allocator reports 1.8GB in use, but RSS shows 2.1GB. The system says 8.2GB used, but 6.8GB available. What does any of this mean?
This is the memory metrics confusion that every developer encounters. This post builds a complete taxonomy: from physical RAM allocation to virtual address spaces to per-process resident sets to allocator-level tracking. By the end, you’ll know which metric matters for your specific debugging scenario.
Quick Cheat Sheet
Before the deep dive, here’s what matters for common scenarios:
- System low on memory? Check
availableinfree -h(notfree- that’s misleading) - Process memory growing? Track RSS over time via
/proc/[pid]/statusorhtop - Heap vs RSS gap? Compare allocator stats to RSS - if gap grows unbounded, you have a structural leak
- Container OOM’d? Check cgroup memory (
docker statsormemory.currentin/sys/fs/cgroup) - Shared memory accounting? Use PSS (not RSS) to fairly divide shared pages across processes
- Performance issues? If Working Set Size > available RAM, you’re thrashing (add RAM or reduce working set)
The rest of this post explains why these metrics exist, how they relate, and when each one matters.
Foundational Concepts
Before diving into metrics, establish the building blocks:
Physical Memory (RAM)
Random Access Memory - the actual hardware chips on your motherboard. Data stored in RAM is lost when power is removed (volatile). Measured in gigabytes (GB). This is the finite resource all processes compete for.
When you see “16GB RAM”, that’s physical memory. The kernel manages which processes get which physical pages.
Virtual Memory
An abstraction that gives each process its own private address space. On x86-64 Linux, user processes see up to 128TB of addressable memory (lower canonical range), regardless of how much physical RAM exists.
Virtual addresses are translated to physical addresses by the Memory Management Unit (MMU) using page tables maintained by the kernel.
Multiple processes can have the same virtual address (e.g., 0x7fff00000000) pointing to different physical pages. Virtual memory provides isolation - one process cannot see another’s memory.
Page
The fundamental unit of memory management. On x86-64 Linux, the default page size is 4KB (4,096 bytes).
Memory is not allocated byte-by-byte. The kernel allocates full pages. When you allocate 1 byte, the kernel maps at least one 4KB page into your address space.
Pages can be:
- Mapped: Associated with a virtual address range in a process
- Resident: Physically present in RAM (vs swapped to disk)
- Shared: Mapped into multiple processes’ address spaces
- Dirty: Modified since being loaded from disk
- Clean: Unmodified, can be discarded and re-read
Memory Mapping
The process of linking a virtual address range to physical pages or files. Created via:
- Anonymous mapping: Backed by RAM (or swap), not a file. Used for heap, stack.
- File-backed mapping: Backed by a file on disk. Used for code, shared libraries, memory-mapped files.
Example: When you load a shared library, the kernel creates a file-backed mapping. Multiple processes loading the same library share the same physical pages.
Address Space
The range of virtual addresses available to a process. On 64-bit Linux:
- User space:
0x0000000000000000to0x00007fffffffffff(lower 128TB) - Kernel space:
0xffff800000000000to0xffffffffffffffff(upper 128TB)
Each process has its own user space address range. Kernel space is shared across all processes but only accessible in kernel mode.
Process
An executing program with:
- Private address space (virtual memory)
- Code (instructions)
- Data (global variables)
- Heap (dynamic allocations via
malloc) - Stack (local variables, function call frames)
- Open files, network sockets, etc.
Each process sees its own isolated memory view. The kernel manages the mapping between virtual addresses (what the process sees) and physical pages (actual RAM).
Kernel Space vs User Space
User space: Where application code runs. Cannot directly access hardware or other processes’ memory. Uses system calls to request kernel services.
Kernel space: Where the kernel runs with full hardware access. Manages physical memory, schedules processes, handles I/O.
When you call malloc(), your user space code eventually makes a system call (like brk() or mmap()) that crosses into kernel space to allocate pages.
With these foundations established, we can now explore why measuring memory is complex.
Why Multiple Memory Metrics Exist
Memory measurement happens at different layers of the system:
- Hardware layer: Physical DRAM chips and their organization
- Kernel layer: Physical pages, page cache, kernel allocations
- Process layer: Virtual address spaces, mapped pages, shared memory
- Allocator layer: Heap structures, freed vs allocated, internal fragmentation
Each layer sees memory differently. A page might be allocated at the kernel level (included in “used”), belong to a process’s virtual address space (counted in VmSize), be physically mapped (counted in RSS), but the backing memory is freed at the allocator level (not in heap usage).
Understanding which layer you’re measuring is the first step to interpreting memory metrics correctly.
System-Level Memory Taxonomy
Start with what the kernel sees: physical RAM and how it’s partitioned.
Total Memory
The total amount of installed physical RAM. On most systems, a small portion is reserved by firmware/BIOS and never visible to the OS.
| |
Used Memory
All pages allocated by the kernel or mapped to userspace processes. This includes:
- Kernel code and data structures
- Slab caches (kernel object allocators)
- Anonymous pages (process heaps, stacks)
- File-backed pages (memory-mapped files, shared libraries)
- Page cache (cached file data)
- Buffer cache (filesystem metadata, block device buffers)
The misleading part: Page cache and buffers are included in “used” but are instantly reclaimable. Linux aggressively caches recently accessed files in RAM. This makes “used” appear high even when the system has plenty of available memory.
Free Memory
Pages that are completely unallocated. No process has mapped them, no kernel structure uses them, they contain no cached data.
On a healthy system, “free” memory is typically small (< 5% of total). This doesn’t mean the system is low on memory - it means Linux is doing its job by caching data.
Available Memory
This is the metric that matters for “will my application OOM?”
Available memory estimates how much RAM can be allocated to new processes without swapping. It includes:
- Free pages (completely unallocated)
- Reclaimable cache (page cache that can be dropped)
- Reclaimable slab caches (kernel allocator caches)
| |
In this example:
- 8.2GB “used” sounds bad
- 1.1GB “free” sounds worse
- But 6.8GB “available” is fine - the system has plenty of memory
The math: available ≈ free + reclaimable_cache + reclaimable_slab
Page Cache
File data cached in RAM. When you read a file, Linux keeps it in the page cache. Subsequent reads are served from RAM instead of disk.
The page cache is:
- Included in “used” - pages are allocated
- Included in “available” - pages can be instantly reclaimed
- Shared across processes - multiple processes mapping the same file share cache pages
| |
Buffer Cache
Metadata and block buffers for filesystems. Includes:
- Directory structures
- Inode caches
- Superblock caches
- Device block buffers
Like page cache, buffers are reclaimable but counted in “used”.
Process-Level Memory Taxonomy
Each process has its own view of memory through virtual address spaces.
Virtual Memory Size (VmSize)
The total address space reserved by a process. This includes:
- Code and data segments
- Heap (grows via
brk()ormmap()) - Thread stacks
- Memory-mapped files
- Shared libraries
| |
Important: VmSize is an address space reservation, not physical memory usage. You can reserve terabytes of address space without using any RAM.
Example:
| |
Resident Set Size (RSS)
The amount of physical RAM currently mapped to the process’s address space. This is the actual memory consumption.
| |
RSS includes:
- Anonymous pages: Heap allocations, stack, not backed by files
- File-backed pages: Code, shared libraries, memory-mapped files
- Shared pages: Libraries shared with other processes (fully counted, not divided)
RSS grows when:
- You access newly allocated anonymous pages (write faults for heap/stack)
- You read or write memory-mapped files (demand paging)
- You create threads (new stack pages)
- Pages are copied for copy-on-write (forked processes)
RSS shrinks when:
- The kernel reclaims pages under memory pressure
- You call
madvise(MADV_DONTNEED)to release pages - You unmap memory (
munmap())
libc.so.6, each process’s RSS includes the full library size. The total RSS across processes can exceed physical RAM because shared pages are counted multiple times.Proportional Set Size (PSS)
RSS but with shared pages divided proportionally among processes.
If libc.so.6 is 2MB and shared by 10 processes, each process’s PSS includes 200KB (2MB / 10).
| |
PSS gives a more accurate picture of per-process memory usage. Sum all processes’ PSS and you get a number close to actual system memory usage.
Unique Set Size (USS)
Memory that is completely private to the process. No shared pages counted.
USS shows what would be freed if the process exited. It’s the truest measure of per-process memory cost.
Calculating USS requires walking /proc/[pid]/smaps and summing Private_Clean and Private_Dirty:
| |
USS < PSS < RSS: USS excludes all shared pages, PSS divides shared pages, RSS counts all pages.
Working Set Size (WSS)
The set of pages actively accessed by the process over a time window. Not directly reported by the kernel but critical for performance.
WSS represents the minimum RAM needed to avoid thrashing. If WSS > available RAM, the process will constantly page fault.
Measuring WSS is approximate. Tools estimate it via:
- Sampling page faults with perf events
- Checking referenced bits in
/proc/kpageflags(requires root) - Using
mincore()to track page residency changes - Instrumenting page table access bits (kernel support varies)
Tools like wss (from Brendan Gregg’s perf tools) approximate WSS using these techniques:
| |
WSS < RSS is normal. Not all resident pages are actively used. The gap represents cold data (old allocations, rarely accessed structures).
The Gap: Heap vs RSS
This is where confusion often occurs and where structural memory issues appear.
Your allocator (malloc/jemalloc/tcmalloc) tracks heap allocations. The kernel tracks RSS (physical pages). These numbers don’t match.
Why RSS > Heap
- Allocator overhead: Metadata, alignment, guard pages
- Granularity: Allocators work in large chunks (arenas, slabs), not individual allocations
- Fragmentation: Freed memory stays mapped until coarse-grained structures drain
- Retained memory: Allocators cache freed memory for reuse instead of returning to OS
Example:
| |
This gap - memory that’s freed at the allocator level but still resident at the kernel level - is where structural memory leaks occur.
Decision Rule: Stable vs Growing Gap
The RSS - heap gap tells you about allocator health:
Stable gap (normal):
Hour 1: Heap 1.8GB, RSS 2.1GB (gap: 300MB)
Hour 6: Heap 1.9GB, RSS 2.2GB (gap: 300MB)
Hour 24: Heap 1.8GB, RSS 2.1GB (gap: 300MB)
This is expected allocator overhead. Memory use is proportional to load.
Growing gap (structural leak):
Hour 1: Heap 1.8GB, RSS 2.1GB (gap: 300MB)
Hour 6: Heap 1.9GB, RSS 2.7GB (gap: 800MB)
Hour 24: Heap 1.8GB, RSS 4.2GB (gap: 2.4GB)
Heap usage is stable but RSS grows. The allocator cannot return freed memory because coarse-grained structures (slabs, arenas, epochs) remain partially full. This is a drainability failure.
Page Granularity
Memory is managed in pages, not bytes. On x86-64 Linux:
- Standard pages: 4KB
- Huge pages: 2MB (enabled via Transparent Huge Pages or explicit allocation)
- Giant pages: 1GB (rare, usually explicit)
Why Pages Matter
Allocation granularity: When you malloc(1), the allocator might allocate from an existing arena, but if it needs more memory from the kernel:
| |
Fragmentation: Partially used pages can’t be returned. One live allocation pins the entire page.
TLB pressure: The CPU’s Translation Lookaside Buffer caches virtual-to-physical mappings. More pages = more TLB misses = slower memory access. Huge pages (2MB) reduce this pressure.
Page faults: First access to a mapped page triggers a page fault (soft fault if already in RAM, hard fault if needs disk I/O). RSS increases when pages become resident - on read faults for file-backed mappings, on write faults for anonymous pages or copy-on-write.
Transparent Huge Pages (THP)
Transparent Huge Pages is a Linux kernel feature that automatically promotes standard 4KB pages to 2MB huge pages to reduce TLB (Translation Lookaside Buffer) pressure and improve memory access performance.
The TLB Problem
The CPU’s TLB caches virtual-to-physical address translations. TLB capacity is limited (typically 64-512 entries for data, similar for instructions). When your application uses gigabytes of memory with 4KB pages, the TLB can’t hold all the translations.
Example without THP:
Application uses 4GB of memory
4GB ÷ 4KB pages = 1,048,576 page table entries needed
TLB capacity: ~512 entries
TLB hit rate: 0.05% (most accesses miss)
Result: Constant page table walks (4-5 memory accesses per TLB miss)
With 2MB huge pages:
Application uses 4GB of memory
4GB ÷ 2MB pages = 2,048 page table entries needed
TLB capacity: ~512 entries
TLB hit rate: 25% (much better)
Result: Fewer page table walks, faster memory access
Each TLB miss costs 100-200 CPU cycles. For memory-intensive workloads, THP can improve performance by 5-30%.
How THP Works
The kernel attempts to use 2MB pages automatically without application changes:
- Allocation: When allocating anonymous memory (heap, stack), the kernel tries to find 2MB contiguous physical regions
- Promotion: The kernel scans for opportunities to combine 512 adjacent 4KB pages into one 2MB page
- Compaction: If memory is fragmented, the kernel migrates pages to create contiguous 2MB regions
- Splitting: When necessary (memory pressure, munmap of partial region), the kernel splits 2MB pages back to 4KB
Check THP status:
| |
The Performance Trade-off
Benefits:
- Reduced TLB misses (5-30% performance improvement for memory-intensive workloads)
- Lower page table overhead (fewer entries to manage)
- Fewer page faults (one fault covers 2MB vs 4KB)
Costs:
- Compaction overhead: Kernel pauses allocations to migrate pages and create 2MB contiguous regions
- Higher memory usage: 2MB granularity means more internal fragmentation
- Delayed memory reclaim: Kernel must split huge pages before reclaiming, adding latency
- Unpredictable latency spikes: Compaction can take milliseconds
When THP Causes Problems
Many production systems disable THP because the costs outweigh the benefits:
Databases (Redis, MongoDB, PostgreSQL, MySQL):
| |
Databases prefer predictable latency over throughput. They write randomly to memory, which quickly fragments 2MB regions. THP promotion attempts cause unpredictable pauses.
Recommendation from Redis documentation:
| |
Containers with memory limits:
THP uses more memory than 4KB pages due to 2MB granularity. For a container with a 1GB limit, THP can trigger OOM kills sooner because the kernel can’t reclaim memory as granularly.
Latency-sensitive applications:
Real-time systems, low-latency services, and interactive applications suffer from unpredictable compaction pauses. For these workloads, consistent 4KB page behavior is preferable.
When THP Helps
Throughput-oriented workloads:
- Batch processing (MapReduce, analytics)
- Scientific computing (simulation, modeling)
- Video encoding/decoding
- Machine learning training (large matrix operations)
These workloads benefit from TLB efficiency and don’t care about millisecond latency spikes.
Sequential memory access patterns:
If your application allocates large contiguous regions and accesses them sequentially, THP works well because:
- Memory is less fragmented
- Compaction is less frequent
- TLB benefits are maximized
Configuration Options
THP has three modes:
| |
Defragmentation policy controls how aggressively the kernel compacts memory:
| |
Impact on Memory Metrics
THP affects how you interpret RSS and memory usage:
RSS granularity:
With 4KB pages, RSS can decrease in 4KB increments. With 2MB huge pages, the kernel must split the huge page before reclaiming, delaying RSS decreases.
| |
Memory fragmentation:
THP promotion requires 2MB contiguous physical regions. If memory is fragmented, promotion fails and falls back to 4KB pages. This explains why two identical applications might have different AnonHugePages values depending on allocation order.
Allocator behavior:
Allocators see 2MB chunks from the kernel instead of 4KB. This can interact poorly with allocator fragmentation - a few live objects in a 2MB region prevent the entire region from being returned to the OS.
Monitoring THP Effectiveness
| |
If compact_stall is high and growing, compaction is causing latency spikes.
Decision Framework
| Workload Type | THP Recommendation | Reason |
|---|---|---|
| Databases (Redis, Postgres, MySQL) | Disable | Random writes fragment memory, compaction causes latency spikes |
| Real-time systems | Disable | Unpredictable compaction pauses violate latency SLAs |
| Containers with tight limits | Disable or madvise | 2MB granularity wastes memory, triggers OOM sooner |
| Batch processing / analytics | Enable | Benefits from TLB efficiency, latency spikes don’t matter |
| Scientific computing | Enable | Large sequential memory access patterns benefit from huge pages |
| Machine learning training | Enable | Large matrix operations see significant speedup |
| Web servers (mixed workload) | madvise | Let application control via madvise() for specific allocations |
compact_stall and thp_fault_fallback - if these metrics are high and growing, THP is hurting more than helping.Dirty vs Clean Pages
Pages are classified by whether they’ve been modified:
Clean Pages
- Original content unchanged
- Can be discarded and re-read from backing store (file on disk)
- Examples: Read-only code, memory-mapped files that haven’t been written
Dirty Pages
- Modified since being loaded or mapped
- Must be written to swap (or dropped via
madvise()) before reclaiming - Examples: Written heap allocations, modified memory-mapped files, stack pages that have been used
Anonymous memory is typically dirty once written. File-backed pages become dirty when modified.
Under memory pressure, the kernel prefers to reclaim clean pages (drop immediately) over dirty pages (must write to swap first).
| |
Anonymous vs File-Backed Memory
Anonymous Memory
Not backed by any file. Created via:
malloc()(for large allocations, usesmmap(MAP_ANONYMOUS))- Stack allocations
mmap()withMAP_ANONYMOUSflag
When swapped out, goes to swap space (if enabled). Otherwise, cannot be reclaimed without killing the process.
File-Backed Memory
Mapped from files on disk:
- Code segments (executables,
.solibraries) - Memory-mapped files (
mmap()withoutMAP_ANONYMOUS) - Shared libraries
When memory pressure occurs, clean file-backed pages can be discarded and re-read from disk. No swap needed.
| |
Container Memory Accounting
Docker and Kubernetes use cgroups to limit memory. But cgroup accounting differs from process RSS.
| |
Cgroup memory includes:
- All process RSS
- Page cache attributed to the cgroup
- Can include some kernel memory depending on cgroup version and configuration
This can exceed the sum of individual process RSS values within the container because cached file data is shared.
Common Debugging Scenarios
Scenario 1: “Why does free show 1GB free but my app OOM’d?”
Check: available, not free
| |
Available memory is 0.9GB. The system is actually low on memory despite 0.8GB “free” because there’s very little reclaimable cache (only 0.5GB buff/cache, most likely dirty and in active use).
Solution: Add RAM, reduce memory usage, enable swap, or kill memory-heavy processes.
Scenario 2: “Valgrind says no leaks, but RSS keeps growing”
Check: RSS trend over time, compare to heap usage
| |
If RSS grows but heap usage stays flat, you have a structural leak. The allocator cannot return freed memory to the kernel because coarse-grained structures (slabs, arenas) remain partially full.
Solution: Profile allocator drainability with tools like libdrainprof or switch to an allocator with better granularity for your workload.
Scenario 3: “Docker says 2GB but RSS shows 4GB”
Check: Sum RSS of all processes, compare to cgroup memory
| |
The gap is page cache. Processes in the container have read files, and the page cache (2GB) is attributed to the cgroup.
Solution: This is normal. If the container is being OOM killed despite low RSS, you may need to increase the memory limit to account for necessary cache.
Scenario 4: “htop shows 60% memory used, system feels fine”
Check: buff/cache and available
| |
60% “used” but 5.8GB available. Most of the “used” memory is cache (5.2GB buff/cache). The system is healthy.
Solution: No action needed. This is normal Linux behavior.
Scenario 5: “Process RSS is 500MB but heap profiler shows 200MB”
Check: Allocator overhead, fragmentation, retained memory
| |
Allocated (200MB) is what the application uses. Active (500MB) includes allocator metadata and fragmentation. Mapped (512MB) matches RSS.
Solution: This gap is normal. If it grows unbounded, profile allocator fragmentation and consider tuning allocator parameters.
Decision Framework: Which Metric Matters?
| Debugging Scenario | Metric to Check | What It Tells You | Action If High |
|---|---|---|---|
| System running low on memory | available (from free -h) | RAM that can be allocated without swapping | Add RAM, reduce workload, investigate RSS growth |
| Process memory growth over time | RSS trend | Physical memory footprint increasing | Profile heap usage, check for leaks, measure allocator drainability |
| Suspected memory leak | RSS vs heap usage | Gap between allocated and resident memory | Run Valgrind (finds object leaks), profile allocator (finds structural leaks) |
| Shared memory accounting across processes | PSS (not RSS) | Fair attribution of shared pages | Use PSS for cost accounting, RSS for process limits |
| Container being OOM killed | Cgroup memory (memory.current or memory.usage_in_bytes) | Total memory including cache | Increase container limit or reduce cache pressure |
| Performance degradation (thrashing) | WSS vs available RAM | Working set fits in RAM? | Add RAM, reduce working set, improve locality |
| Understanding allocator behavior | RSS - heap allocations | Allocator overhead and fragmentation | Tune allocator, switch allocators, reduce fragmentation |
When Metrics Don’t Tell the Full Story
You’ve measured everything. RSS is stable. Heap usage tracks with RSS. No leaks detected. But memory issues persist.
This is where you need to look deeper at allocator behavior:
- Drainability: Can the allocator return memory when objects are freed?
- Fragmentation: Are coarse-grained structures (slabs, arenas, epochs) partially full?
- Retention: Is the allocator holding freed memory for reuse instead of returning it?
Traditional tools measure allocation and deallocation events. They don’t measure whether freed memory can actually be reclaimed.
This is the gap that structural memory leaks exploit and why drainability profiling exists.
Tools Summary
System-level memory:
free -h- System memory breakdown (useavailable, ignorefree)vmstat 1- Memory stats over time/proc/meminfo- Detailed kernel memory accounting
Process-level memory:
ps aux- RSS per process (column 6, in KiB)ps -o rss= -p $(pidof app)- RSS for specific process (cleaner than ps aux)top/htop- Real-time RSS monitoring/proc/[pid]/status- VmSize, VmRSS, VmData, and more/proc/[pid]/smaps- Detailed per-mapping breakdown (address ranges, permissions, RSS per mapping)/proc/[pid]/smaps_rollup- Aggregated PSS, USS, dirty/clean breakdown
Allocator profiling:
jemallocstats -malloc_stats_print()for internal statetcmallocprofiler - Heap profile snapshotsvalgrind --tool=massif- Heap over time- libdrainprof - Drainability satisfaction rate
Container memory:
docker stats- Cgroup memory usage/sys/fs/cgroup/memory/(v1) or/sys/fs/cgroup/(v2) - Raw cgroup memory files
Quick Reference Glossary
RAM (Random Access Memory): Physical memory chips. The finite hardware resource all processes share.
Virtual Memory: Per-process address space abstraction. Each process sees a private, isolated address range.
Page: Fundamental memory unit. 4KB on x86-64 Linux (2MB for huge pages, 1GB for giant pages).
RSS (Resident Set Size): Physical memory currently mapped to a process. Includes anonymous (heap/stack) and file-backed (code/libs) pages. Shared pages counted fully in each process.
VmSize (Virtual Memory Size): Total address space reserved by a process. Includes mapped and unmapped regions. Can vastly exceed physical RAM.
PSS (Proportional Set Size): RSS with shared pages divided proportionally. If 10 processes share a 2MB library, each process’s PSS includes 200KB.
USS (Unique Set Size): Memory private to a process. Excludes all shared pages. Shows what would be freed if the process exits.
WSS (Working Set Size): Pages actively accessed by a process over a time window. The minimum RAM needed to avoid thrashing.
Page Cache: File data cached in RAM. Included in “used” but instantly reclaimable. Makes repeated file reads fast.
Buffer Cache: Filesystem metadata (inodes, superblocks, directory entries) cached in RAM. Also reclaimable.
Anonymous Pages: Memory not backed by files. Created by malloc(), stack allocations. Must be swapped to disk to reclaim.
File-Backed Pages: Memory mapped from files. Code segments, shared libraries, memory-mapped files. Can be discarded and re-read from disk.
Dirty Pages: Modified since loading or mapping. Must be written to swap before reclaiming. Anonymous pages are typically dirty once written.
Clean Pages: Unmodified. Can be discarded immediately and re-read if needed. Read-only code pages are clean.
Page Fault: CPU exception when accessing unmapped or swapped-out memory. Kernel resolves by mapping a physical page.
Swap: Disk space used to store pages when physical RAM is full. Slower than RAM (milliseconds vs nanoseconds).
TLB (Translation Lookaside Buffer): CPU cache for virtual-to-physical address translations. Reduces page table lookup overhead.
Cgroup (Control Group): Linux kernel feature for resource limiting. Docker/Kubernetes use cgroups to enforce memory limits.
OOM (Out Of Memory) Killer: Kernel subsystem that kills processes when memory is exhausted. Selects victims based on memory usage and priority.
Slab Cache: Kernel’s object allocator. Caches frequently allocated structures (inodes, dentries) to reduce allocation overhead.
Huge Pages: 2MB pages (vs standard 4KB). Reduce TLB pressure for memory-intensive applications.
Available Memory: Estimate of RAM that can be allocated without swapping. Includes free pages and reclaimable cache.
Drainability: The ability of a coarse-grained allocator (slab, arena, epoch) to return memory to the OS when objects are freed. Low drainability causes structural leaks.
Wrapping Up
Memory measurement is a layered problem. The kernel sees pages. Processes see virtual address spaces. Allocators see heap structures. Each layer has its own accounting.
The taxonomy:
System level: total, used, free, available, buff/cache Process level: VmSize (virtual), RSS (resident), PSS (proportional), USS (unique), WSS (working set) Allocator level: heap allocated, heap overhead, fragmentation, drainability
Most debugging scenarios require checking metrics at multiple layers:
- OOM? Check
available(system) and cgroup memory (memory.currentormemory.usage_in_bytes) - Memory leak? Check RSS trend (process) and heap usage (allocator)
- Performance? Check WSS (process) vs available RAM (system)
When metrics look healthy but problems persist, you’re likely dealing with allocator-level issues - fragmentation, retention, or structural leaks that traditional tools can’t see.
That’s when you need to measure drainability: can the allocator actually return memory when objects are freed? For that, see the next post in this series on structural memory leaks and tools like libdrainprof .
Further reading:
- Linux
/procfilesystem documentation:man 5 proc - Kernel memory management: kernel.org/doc/html/latest/admin-guide/mm/
- Understanding the Linux Virtual Memory Manager: [Gorman, 2007]
- Structural Memory Leaks and Drainability (previous post in this series)