The Price of Everything Being an Object in Python

Why Python stores everything on the heap: examining the memory overhead of Python's object model, comparing int storage across languages, and understanding the performance implications

All Python developers know that everything in Python is an object. Numbers are objects. Strings are objects. Functions are objects. Even None is an object.

But at what cost?

This design decision has profound implications for memory usage and performance. In typical C code, an integer is stored inline (often on the stack or in registers) with no metadata - just 4 bytes. In Python, that same integer is a 28-byte object on the heap, accessed through pointer indirection. This article explores why Python made this choice, what the overhead looks like in practice, and when it matters.

Note: Sizes shown are for 64-bit CPython builds; exact layout varies by platform and build configuration.

What this means: Python’s cost isn’t just “heap vs stack” - it’s pointer indirection, reference counting, and loss of data locality. Every Python value is boxed (stored as an object with metadata), requiring pointer dereference to access the actual value. This enables dynamic typing and flexible programming at the cost of memory overhead and allocation performance.

Memory Layout: C vs Python

C Integer Storage

In typical C code, integers are stored inline without metadata:

1
2
3
4
int x = 42;              // Automatic storage (typically stack)
static int y = 42;       // Static storage (data segment)
int* z = malloc(sizeof(int)); // Heap (explicit allocation)
*z = 42;

For local variables, the typical layout is:

Memory layout (automatic/stack storage):

Stack:
┌──────────┐
│ 00000000 │
│ 00000000 │
│ 00000000 │
│ 00101010 │  ← 42 in binary
└──────────┘
Total: 4 bytes
Location: Stack (or register)
Allocation: Instant (bump stack pointer)

The CPU can directly operate on this value. No indirection, no metadata, no dynamic allocation.

Python Integer Storage

In Python, the same integer is an object on the heap:

1
x = 42

Memory layout (CPython 3.11+):

Stack:
┌─────────────┐
│ 0x7f8a3c... │  ← Pointer to PyObject (8 bytes)
└─────────────┘

Heap:
┌─────────────────────┐
│ Reference Count (8) │  ← How many references to this object
├─────────────────────┤
│ Type Pointer (8)    │  ← Points to PyLong_Type
├─────────────────────┤
│ Size (8)            │  ← Number of digits (for arbitrary precision)
├─────────────────────┤
│ Value (4)           │  ← Actual integer value: 42
└─────────────────────┘
Total: 28 bytes
Location: Heap
Allocation: CPython allocator (pymalloc), refcount initialization

Seven times larger. And this doesn’t include the pointer on the stack (8 bytes) that references this object.


The PyObject Structure

Every Python object starts with a PyObject header:

1
2
3
4
5
// CPython source: Include/object.h
typedef struct _object {
    Py_ssize_t ob_refcnt;    // Reference count (8 bytes on 64-bit)
    PyTypeObject *ob_type;   // Pointer to type object (8 bytes)
} PyObject;

For integers specifically (PyLongObject):

1
2
3
4
5
typedef struct {
    PyObject ob_base;        // 16 bytes (refcnt + type)
    Py_ssize_t ob_size;      // 8 bytes (number of digits for bigint)
    digit ob_digit[1];       // 4+ bytes (actual value)
} PyLongObject;

Breakdown for x = 42:

  • Reference count: 8 bytes
  • Type pointer: 8 bytes
  • Size field: 8 bytes
  • Value: 4 bytes
  • Total: 28 bytes

Compare this to C:

  • Value: 4 bytes
  • Total: 4 bytes
flowchart TB subgraph c["C Integer (Stack)"] c_var[int x = 42] c_mem["4 bytes
Direct value"] c_var --> c_mem end subgraph py["Python Integer (Heap)"] py_var[x = 42] py_ptr["Stack: 8-byte pointer"] py_obj["Heap: 28-byte PyLongObject"] py_refcnt["Refcount: 8 bytes"] py_type["Type ptr: 8 bytes"] py_size["Size: 8 bytes"] py_val["Value: 4 bytes"] py_var --> py_ptr py_ptr -.-> py_obj py_obj --> py_refcnt py_obj --> py_type py_obj --> py_size py_obj --> py_val end style c fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0

Comparing Across Languages

Integer Storage Comparison

LanguageStorage LocationSize (bytes)MetadataAllocation
CInline (stack/register)4NoneInstant
GoInline (escape analysis)8None (stack); GC metadata if escapedStack or heap (escape analysis)
RustInline (stack)4 or 8NoneInstant
JavaStack (primitive)4NoneInstant
JavaHeap (Integer object)~16 (typical HotSpot)Object header (~12) + value (4)Allocator
PythonHeap (always boxed)28Refcount (8) + type (8) + size (8) + value (4)pymalloc

Code Examples

C:

1
2
int x = 42;              // Stack: 4 bytes
int arr[100];            // Stack: 400 bytes (100 * 4)

Go:

1
2
x := 42                  // Stack: 8 bytes (int is 64-bit)
arr := [100]int{}        // Stack: 800 bytes (100 * 8)

Rust:

1
2
let x: i32 = 42;         // Stack: 4 bytes
let arr = [0i32; 100];   // Stack: 400 bytes

Java:

1
2
3
int x = 42;              // Stack: 4 bytes (primitive)
Integer y = 42;          // Stack: 8-byte ref, Heap: 16-byte object
int[] arr = new int[100]; // Stack: 8-byte ref, Heap: 412 bytes

Python:

1
2
x = 42                   # Stack: 8-byte ref, Heap: 28-byte object
arr = [0] * 100          # Stack: 8-byte ref, Heap: ~3KB (list object + 100 PyLong objects)

Why Everything is on the Heap

The Design Rationale

Python’s creators made a deliberate choice: simplicity and flexibility over raw performance.

1. Dynamic Typing:

In C, the compiler knows types at compile time:

1
2
int x = 42;      // Compiler: x is int, allocate 4 bytes on stack
float y = 3.14;  // Compiler: y is float, allocate 4 bytes on stack

In Python, types are determined at runtime:

1
2
3
x = 42       # Runtime: x references PyLongObject
x = "hello"  # Runtime: x now references PyUnicodeObject
x = [1, 2]   # Runtime: x now references PyListObject

The variable x is just a name bound to an object. The object carries its own type information. This requires objects to be heap-allocated with metadata.

2. Everything is a Reference:

Python variables are not values - they’re references to objects:

1
2
3
4
5
x = 42
y = x       # y and x both reference the same object

# Prove it:
id(x) == id(y)  # True (same memory address)

Compare to C:

1
2
int x = 42;
int y = x;  // y is a copy of x's value, not a reference

This reference model requires heap allocation so objects can be shared across scopes.

3. Garbage Collection:

Python uses reference counting (and cyclic GC) to manage memory. Every object needs a reference count:

1
2
3
4
x = 42          # Create PyLongObject, refcount = 1
y = x           # refcount = 2
del x           # refcount = 1
del y           # refcount = 0, object deallocated

This requires every value to be an object with a reference count field.

4. Uniform Object Interface:

Every Python object has a consistent interface:

1
2
3
4
x = 42
x.__class__      # <class 'int'>
x.__sizeof__()   # 28
dir(x)           # ['__abs__', '__add__', ...]

Even integers have methods:

1
2
(42).bit_length()   # 6
(42).to_bytes(4, 'big')  # b'\x00\x00\x00*'

This requires integers to be full objects with type information and method tables.


The Performance Cost

Allocation Speed

Benchmark: Creating 1 million integers

1
2
3
4
5
6
7
8
// C: Stack allocation
clock_t start = clock();
for (int i = 0; i < 1000000; i++) {
    int x = i;
    // x automatically deallocated
}
clock_t end = clock();
// Time: ~1ms (stack pointer bump)
1
2
3
4
5
6
7
8
# Python: Heap allocation
import time
start = time.time()
for i in range(1000000):
    x = i
    # x reference decremented, object may be deallocated
end = time.time()
# Time: ~50ms (heap allocation + GC)

50x slower. Most of this overhead is heap allocation and reference counting.

Note: Exact timings vary by platform, Python version, and allocator behavior; the ratios shown are illustrative of typical overhead.

Memory Bandwidth

Array of 1 million integers:

LanguageMemory UsedNotes
C4 MBContiguous array on heap
Go8 MBContiguous array
Rust4 MBContiguous Vec<i32>
Java4 MB + overheadPrimitive array, contiguous
Python~36-40+ MBList pointers (~8MB) + per-int objects (~28-32MB) + allocator overhead

Python uses 7x more memory than C for the same logical data.

Cache Performance

Modern CPUs rely on cache locality. Contiguous, inline data benefits from:

  • Sequential access (contiguous memory layout)
  • Small size (fits in L1 cache: 32-64 KB)
  • Prefetching (CPU predicts access patterns)

Heap-allocated Python objects suffer from:

  • Scattered allocation (objects not contiguous)
  • Large size (cache misses)
  • Pointer chasing (follow reference to find value)

Example:

1
2
3
4
5
6
// C: Sum array (cache-friendly)
int arr[1000];
int sum = 0;
for (int i = 0; i < 1000; i++) {
    sum += arr[i];  // Sequential memory access, cache-friendly
}
1
2
3
4
5
# Python: Sum list (cache-unfriendly)
arr = list(range(1000))
total = sum(arr)
# Each arr[i] is a pointer to a PyLongObject
# Dereference pointer, follow to heap object (cache miss)
flowchart LR subgraph c_array["C Array (Contiguous)"] c1[4 bytes] c2[4 bytes] c3[4 bytes] c4[...] c1 --- c2 --- c3 --- c4 end subgraph py_list["Python List (Scattered)"] py_arr["List object"] py_ptr1["Ptr 1"] py_ptr2["Ptr 2"] py_ptr3["Ptr 3"] py_obj1["PyLong
28 bytes"] py_obj2["PyLong
28 bytes"] py_obj3["PyLong
28 bytes"] py_arr --> py_ptr1 py_arr --> py_ptr2 py_arr --> py_ptr3 py_ptr1 -.-> py_obj1 py_ptr2 -.-> py_obj2 py_ptr3 -.-> py_obj3 end style c_array fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py_list fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0

Python’s Optimizations

Python doesn’t leave performance entirely on the table. CPython includes several optimizations:

Small Integer Caching

Python pre-allocates integers from -5 to 256:

1
2
3
4
5
6
7
a = 10
b = 10
a is b  # True - same object

x = 1000
y = 1000
x is y  # False - different objects

Why: Small integers are so common that pre-allocating them saves repeated heap allocations.

Implementation:

1
2
3
// CPython maintains an array of cached integer objects
static PyLongObject small_ints[NSMALLPOSINTS + NSMALLNEGINTS];
// NSMALLNEGINTS = 5, NSMALLPOSINTS = 257

When you create x = 10, Python returns a pointer to the cached object instead of allocating a new one.

String Interning

String literals are automatically interned:

1
2
3
s1 = "hello"
s2 = "hello"
s1 is s2  # True - same object

Runtime strings can be manually interned:

1
2
3
import sys
s3 = sys.intern("hel" + "lo")
s3 is s1  # True

Object Pooling (Tuples, Dicts, etc.)

CPython maintains free lists for frequently used types:

  • Tuples: up to 20 tuples per size (up to size 20)
  • Dicts: 80 dict objects
  • Lists: 80 list objects
  • Floats: 100 float objects

When you delete these objects, they’re returned to the pool instead of being freed. Next allocation reuses them.


When the Overhead Matters

CPU-Bound Number Crunching

Problem: Tight loops processing millions of numbers

1
2
3
4
5
# Python: Slow
total = 0
for i in range(10_000_000):
    total += i * 2
# Time: ~500ms

Solution: Use NumPy (C arrays under the hood):

1
2
3
4
import numpy as np
arr = np.arange(10_000_000)
total = (arr * 2).sum()
# Time: ~20ms (25x faster)

Large Data Structures

Problem: Storing millions of small objects

1
2
# Python: ~280 MB for 10 million integers
data = list(range(10_000_000))

Solution: Use array module for primitive arrays:

1
2
3
import array
data = array.array('i', range(10_000_000))
# Memory: ~40 MB (7x smaller)

Or use NumPy:

1
2
3
import numpy as np
data = np.arange(10_000_000)
# Memory: ~40 MB + negligible overhead

Embedded Systems

Problem: Python on resource-constrained devices

Python’s memory overhead is prohibitive for microcontrollers with KB of RAM.

Solution: Use MicroPython or CircuitPython (optimized for embedded), or use C/Rust for critical paths.


When the Overhead Doesn’t Matter

I/O-Bound Programs

If your program spends most time waiting for network, disk, or user input, Python’s overhead is negligible:

1
2
3
4
5
# Network request: 100ms
# Python overhead: 0.1ms
# Overhead is 0.1% of total time
import requests
response = requests.get("https://api.example.com")

Business Logic and Glue Code

Most Python code is high-level orchestration:

1
2
3
4
5
6
# Overhead insignificant compared to database/API calls
def process_order(order_id):
    order = db.get_order(order_id)       # 10ms (database)
    payment = charge_card(order.amount)  # 50ms (payment API)
    send_email(order.email)              # 20ms (email service)
    # Python overhead: < 1ms

Rapid Development

Python’s productivity gains often outweigh performance costs:

  • Faster development (dynamic typing, no compilation)
  • Easier debugging (runtime introspection)
  • Rich ecosystem (millions of packages)

Cost-benefit:

  • Write Python in 1 day vs C in 1 week
  • Python runs in 100ms vs C in 10ms
  • If code runs infrequently, 1 day saved » 90ms per execution

Comparing Object Overhead Across Languages

Java: Compromise Between C and Python

Java has both primitives (stack) and objects (heap):

1
2
3
4
5
// Primitive: stack-allocated, no overhead
int x = 42;  // 4 bytes on stack

// Boxed: heap-allocated, object overhead
Integer y = 42;  // 8-byte ref + 16-byte object (header + value)

Java object header (HotSpot JVM):

  • Mark word: 8 bytes (hash code, GC info, lock state)
  • Class pointer: 4-8 bytes (compressed oops)
  • Value: 4 bytes
  • Padding: align to 8 bytes
  • Total: 16 bytes

Java’s Integer object (16 bytes) is smaller than Python’s PyLongObject (28 bytes) because:

  • No explicit reference count (GC manages lifetimes)
  • No size field (integers are fixed-size)

Go: Stack-First Philosophy

Go aggressively stack-allocates via escape analysis:

1
2
3
4
5
6
7
8
9
func stackInt() {
    x := 42  // Stack: 8 bytes
    fmt.Println(x)
}

func heapInt() *int {
    x := 42
    return &x  // Escapes to heap: 8 bytes + allocation overhead
}

Go has no primitive/object distinction - the compiler decides based on usage.

Rust: Zero-Cost Abstractions

Rust provides control without overhead:

1
2
3
let x: i32 = 42;            // Stack: 4 bytes
let y = Box::new(42);       // Heap: 4 bytes (no metadata)
let z = Rc::new(42);        // Heap: 4 bytes + 16-byte Rc header

Rust’s Box<i32> is just the value on the heap (4 bytes). No reference count unless you use Rc (reference counted) or Arc (atomic reference counted).


Profiling Python Memory Usage

Measuring Object Size

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys

x = 42
sys.getsizeof(x)  # 28 bytes

s = "hello"
sys.getsizeof(s)  # 54 bytes (PyUnicode overhead + 5 chars)

lst = [1, 2, 3]
sys.getsizeof(lst)  # 80 bytes (list object itself)
# But this doesn't include the PyLongObjects in the list!

# Total memory for list + elements:
total = sys.getsizeof(lst) + sum(sys.getsizeof(x) for x in lst)
# 80 + (28 * 3) = 164 bytes for 3 integers

Memory Profiling Tools

memory_profiler:

1
2
3
4
5
6
7
8
from memory_profiler import profile

@profile
def create_list():
    return [i for i in range(10000)]

create_list()
# Output shows line-by-line memory usage

tracemalloc (built-in):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import tracemalloc

tracemalloc.start()

# Your code here
data = list(range(100000))

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

for stat in top_stats[:10]:
    print(stat)

pympler:

1
2
3
4
from pympler import asizeof

data = list(range(1000))
asizeof.asizeof(data)  # Deep size (includes referenced objects)

Practical Implications

Choosing the Right Tool

Use CasePythonNumPyC ExtensionOther Language
Web API+ Good- Overkill- OverkillConsider Go/Rust
Data processing- Slow+ Good+ GoodConsider Rust
Machine learning+ Good (with NumPy/PyTorch)+ Core+ CoreJulia for research
System tool- Slow startup- Overkill+ GoodGo/Rust better
Scripting+ Excellent- Overkill- Overkill-

Optimization Strategy

1. Profile first:

1
python -m cProfile script.py

2. Identify bottlenecks:

  • CPU-bound loops processing numbers
  • Large collections of small objects
  • Repeated allocations

3. Optimize selectively:

  • Use NumPy for numeric arrays
  • Use array.array for primitive arrays
  • Move hot paths to C extensions (Cython, ctypes)
  • Consider Rust/Go for performance-critical services

4. Don’t over-optimize:

  • Python’s overhead matters in < 10% of code
  • Premature optimization wastes development time
  • Profile, then optimize only what matters

The Trade-Off

Python made a conscious choice: developer productivity over raw performance.

What you gain:

  • Dynamic typing (flexibility)
  • Everything is an object (uniform interface)
  • Rich runtime introspection (debugging, metaprogramming)
  • Automatic memory management (no manual free/delete)
  • Rapid development (no compilation, simple syntax)

What you pay:

  • 7x memory overhead (vs C)
  • 10-50x slower execution (pure Python vs C)
  • Values are boxed objects (typically heap-allocated) accessed via references
  • Pointer indirection and reference counting overhead
  • GC pauses

For most Python code (web services, data pipelines, scripting), the overhead is acceptable. For performance-critical inner loops, drop down to NumPy, C extensions, or another language.

Best Practice: Write your application in Python. Profile to find bottlenecks. Optimize only the hot paths with NumPy, Cython, or Rust extensions. You get 90% of the development speed with 90% of C’s performance where it matters.

Conclusion

Python’s “everything is an object” design carries a real cost:

  • 28 bytes for a simple integer (vs 4 bytes in C)
  • Values are boxed objects (typically heap-allocated) accessed via references
  • Pointer indirection for every value access
  • Reference counting overhead

But this cost buys Python’s greatest strength: simplicity. No manual memory management. No type declarations. No compilation. A uniform object model that makes metaprogramming trivial.

For the vast majority of Python code - web APIs, data pipelines, glue scripts - this trade-off is worth it. The developer time saved dwarfs the CPU cycles lost.

When performance matters, Python offers escape hatches: NumPy for arrays, Cython for hot loops, ctypes for C libraries. You get the best of both worlds - Python’s productivity where it matters, C’s performance where it matters.

The price of everything being an object? Acceptable for most code, optimizable for performance-critical paths.


Further Reading

CPython Internals:

Performance:

Memory Management: