The Price of Everything Being an Object in Python

Why Python stores everything on the heap: examining the memory overhead of Python's object model, comparing int storage across languages, and understanding the performance implications

categories: Python Performance
published: 2026-01-10

All Python developers know that everything in Python is an object. Numbers are objects. Strings are objects. Functions are objects. Even None is an object.

But at what cost?

This design decision has profound implications for memory usage and performance. In typical C code, an integer is stored inline (often on the stack or in registers) with no metadata - just 4 bytes. In Python, that same integer is a 28-byte object on the heap, accessed through pointer indirection. This article explores why Python made this choice, what the overhead looks like in practice, and when it matters.

Note: Sizes shown are for 64-bit CPython builds; exact layout varies by platform and build configuration.

What this means: Python’s cost isn’t just “heap vs stack” - it’s pointer indirection, reference counting, and loss of data locality. Every Python value is boxed (stored as an object with metadata), requiring pointer dereference to access the actual value. This enables dynamic typing and flexible programming at the cost of memory overhead and allocation performance.

Memory Layout: C vs Python

C Integer Storage

In typical C code, integers are stored inline without metadata:

1
2
3
4
int x = 42;              // Automatic storage (typically stack)
static int y = 42;       // Static storage (data segment)
int* z = malloc(sizeof(int)); // Heap (explicit allocation)
*z = 42;

For local variables, the typical layout is:

Memory layout (automatic/stack storage):

Stack:
┌──────────┐
│ 00000000 │
│ 00000000 │
│ 00000000 │
│ 00101010 │  ← 42 in binary
└──────────┘
Total: 4 bytes
Location: Stack (or register)
Allocation: Instant (bump stack pointer)

The CPU can directly operate on this value. No indirection, no metadata, no dynamic allocation.

Python Integer Storage

In Python, the same integer is an object on the heap:

1
x = 42

Memory layout (CPython 3.11+):

Stack:
┌─────────────┐
│ 0x7f8a3c... │  ← Pointer to PyObject (8 bytes)
└─────────────┘

Heap:
┌─────────────────────┐
│ Reference Count (8) │  ← How many references to this object
├─────────────────────┤
│ Type Pointer (8)    │  ← Points to PyLong_Type
├─────────────────────┤
│ Size (8)            │  ← Number of digits (for arbitrary precision)
├─────────────────────┤
│ Value (4)           │  ← Actual integer value: 42
└─────────────────────┘
Total: 28 bytes
Location: Heap
Allocation: CPython allocator (pymalloc), refcount initialization

Seven times larger. And this doesn’t include the pointer on the stack (8 bytes) that references this object.

The PyObject Structure

Every Python object starts with a PyObject header:

1
2
3
4
5
// CPython source: Include/object.h
typedef struct _object {
    Py_ssize_t ob_refcnt;    // Reference count (8 bytes on 64-bit)
    PyTypeObject *ob_type;   // Pointer to type object (8 bytes)
} PyObject;

For integers specifically (PyLongObject):

1
2
3
4
5
typedef struct {
    PyObject ob_base;        // 16 bytes (refcnt + type)
    Py_ssize_t ob_size;      // 8 bytes (number of digits for bigint)
    digit ob_digit[1];       // 4+ bytes (actual value)
} PyLongObject;

Breakdown for x = 42:

Reference count: 8 bytes
Type pointer: 8 bytes
Size field: 8 bytes
Value: 4 bytes
Total: 28 bytes

Compare this to C:

Value: 4 bytes
Total: 4 bytes

flowchart TB subgraph c["C Integer (Stack)"] c_var[int x = 42] c_mem["4 bytes
Direct value"] c_var --> c_mem end subgraph py["Python Integer (Heap)"] py_var[x = 42] py_ptr["Stack: 8-byte pointer"] py_obj["Heap: 28-byte PyLongObject"] py_refcnt["Refcount: 8 bytes"] py_type["Type ptr: 8 bytes"] py_size["Size: 8 bytes"] py_val["Value: 4 bytes"] py_var --> py_ptr py_ptr -.-> py_obj py_obj --> py_refcnt py_obj --> py_type py_obj --> py_size py_obj --> py_val end style c fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0

Comparing Across Languages

Integer Storage Comparison

Language	Storage Location	Size (bytes)	Metadata	Allocation
C	Inline (stack/register)	4	None	Instant
Go	Inline (escape analysis)	8	None (stack); GC metadata if escaped	Stack or heap (escape analysis)
Rust	Inline (stack)	4 or 8	None	Instant
Java	Stack (primitive)	4	None	Instant
Java	Heap (Integer object)	~16 (typical HotSpot)	Object header (~12) + value (4)	Allocator
Python	Heap (always boxed)	28	Refcount (8) + type (8) + size (8) + value (4)	pymalloc

Code Examples

1
2
int x = 42;              // Stack: 4 bytes
int arr[100];            // Stack: 400 bytes (100 * 4)

Go:

1
2
x := 42                  // Stack: 8 bytes (int is 64-bit)
arr := [100]int{}        // Stack: 800 bytes (100 * 8)

Rust:

1
2
let x: i32 = 42;         // Stack: 4 bytes
let arr = [0i32; 100];   // Stack: 400 bytes

Java:

1
2
3
int x = 42;              // Stack: 4 bytes (primitive)
Integer y = 42;          // Stack: 8-byte ref, Heap: 16-byte object
int[] arr = new int[100]; // Stack: 8-byte ref, Heap: 412 bytes

Python:

1
2
x = 42                   # Stack: 8-byte ref, Heap: 28-byte object
arr = [0] * 100          # Stack: 8-byte ref, Heap: ~3KB (list object + 100 PyLong objects)

Why Everything is on the Heap

The Design Rationale

Python’s creators made a deliberate choice: simplicity and flexibility over raw performance.

1. Dynamic Typing:

In C, the compiler knows types at compile time:

1
2
int x = 42;      // Compiler: x is int, allocate 4 bytes on stack
float y = 3.14;  // Compiler: y is float, allocate 4 bytes on stack

In Python, types are determined at runtime:

1
2
3
x = 42       # Runtime: x references PyLongObject
x = "hello"  # Runtime: x now references PyUnicodeObject
x = [1, 2]   # Runtime: x now references PyListObject

The variable x is just a name bound to an object. The object carries its own type information. This requires objects to be heap-allocated with metadata.

2. Everything is a Reference:

Python variables are not values - they’re references to objects:

1
2
3
4
5
x = 42
y = x       # y and x both reference the same object

# Prove it:
id(x) == id(y)  # True (same memory address)

Compare to C:

1
2
int x = 42;
int y = x;  // y is a copy of x's value, not a reference

This reference model requires heap allocation so objects can be shared across scopes.

3. Garbage Collection:

Python uses reference counting (and cyclic GC) to manage memory. Every object needs a reference count:

1
2
3
4
x = 42          # Create PyLongObject, refcount = 1
y = x           # refcount = 2
del x           # refcount = 1
del y           # refcount = 0, object deallocated

This requires every value to be an object with a reference count field.

4. Uniform Object Interface:

Every Python object has a consistent interface:

1
2
3
4
x = 42
x.__class__      # <class 'int'>
x.__sizeof__()   # 28
dir(x)           # ['__abs__', '__add__', ...]

Even integers have methods:

1
2
(42).bit_length()   # 6
(42).to_bytes(4, 'big')  # b'\x00\x00\x00*'

This requires integers to be full objects with type information and method tables.

The Performance Cost

Allocation Speed

Benchmark: Creating 1 million integers

1
2
3
4
5
6
7
8
// C: Stack allocation
clock_t start = clock();
for (int i = 0; i < 1000000; i++) {
    int x = i;
    // x automatically deallocated
}
clock_t end = clock();
// Time: ~1ms (stack pointer bump)

1
2
3
4
5
6
7
8
# Python: Heap allocation
import time
start = time.time()
for i in range(1000000):
    x = i
    # x reference decremented, object may be deallocated
end = time.time()
# Time: ~50ms (heap allocation + GC)

50x slower. Most of this overhead is heap allocation and reference counting.

Note: Exact timings vary by platform, Python version, and allocator behavior; the ratios shown are illustrative of typical overhead.

Memory Bandwidth

Array of 1 million integers:

Language	Memory Used	Notes
C	4 MB	Contiguous array on heap
Go	8 MB	Contiguous array
Rust	4 MB	Contiguous `Vec<i32>`
Java	4 MB + overhead	Primitive array, contiguous
Python	~36-40+ MB	List pointers (~8MB) + per-int objects (~28-32MB) + allocator overhead

Python uses 7x more memory than C for the same logical data.

Cache Performance

Modern CPUs rely on cache locality. Contiguous, inline data benefits from:

Sequential access (contiguous memory layout)
Small size (fits in L1 cache: 32-64 KB)
Prefetching (CPU predicts access patterns)

Heap-allocated Python objects suffer from:

Scattered allocation (objects not contiguous)
Large size (cache misses)
Pointer chasing (follow reference to find value)

Example:

1
2
3
4
5
6
// C: Sum array (cache-friendly)
int arr[1000];
int sum = 0;
for (int i = 0; i < 1000; i++) {
    sum += arr[i];  // Sequential memory access, cache-friendly
}

1
2
3
4
5
# Python: Sum list (cache-unfriendly)
arr = list(range(1000))
total = sum(arr)
# Each arr[i] is a pointer to a PyLongObject
# Dereference pointer, follow to heap object (cache miss)

flowchart LR subgraph c_array["C Array (Contiguous)"] c1[4 bytes] c2[4 bytes] c3[4 bytes] c4[...] c1 --- c2 --- c3 --- c4 end subgraph py_list["Python List (Scattered)"] py_arr["List object"] py_ptr1["Ptr 1"] py_ptr2["Ptr 2"] py_ptr3["Ptr 3"] py_obj1["PyLong
28 bytes"] py_obj2["PyLong
28 bytes"] py_obj3["PyLong
28 bytes"] py_arr --> py_ptr1 py_arr --> py_ptr2 py_arr --> py_ptr3 py_ptr1 -.-> py_obj1 py_ptr2 -.-> py_obj2 py_ptr3 -.-> py_obj3 end style c_array fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py_list fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0

Python’s Optimizations

Python doesn’t leave performance entirely on the table. CPython includes several optimizations:

Small Integer Caching

Python pre-allocates integers from -5 to 256:

1
2
3
4
5
6
7
a = 10
b = 10
a is b  # True - same object

x = 1000
y = 1000
x is y  # False - different objects

Why: Small integers are so common that pre-allocating them saves repeated heap allocations.

Implementation:

1
2
3
// CPython maintains an array of cached integer objects
static PyLongObject small_ints[NSMALLPOSINTS + NSMALLNEGINTS];
// NSMALLNEGINTS = 5, NSMALLPOSINTS = 257

When you create x = 10, Python returns a pointer to the cached object instead of allocating a new one.

String Interning

String literals are automatically interned:

1
2
3
s1 = "hello"
s2 = "hello"
s1 is s2  # True - same object

Runtime strings can be manually interned:

1
2
3
import sys
s3 = sys.intern("hel" + "lo")
s3 is s1  # True

Object Pooling (Tuples, Dicts, etc.)

CPython maintains free lists for frequently used types:

Tuples: up to 20 tuples per size (up to size 20)
Dicts: 80 dict objects
Lists: 80 list objects
Floats: 100 float objects

When you delete these objects, they’re returned to the pool instead of being freed. Next allocation reuses them.

When the Overhead Matters

CPU-Bound Number Crunching

Problem: Tight loops processing millions of numbers

1
2
3
4
5
# Python: Slow
total = 0
for i in range(10_000_000):
    total += i * 2
# Time: ~500ms

Solution: Use NumPy (C arrays under the hood):

1
2
3
4
import numpy as np
arr = np.arange(10_000_000)
total = (arr * 2).sum()
# Time: ~20ms (25x faster)

Large Data Structures

Problem: Storing millions of small objects

1
2
# Python: ~280 MB for 10 million integers
data = list(range(10_000_000))

Solution: Use array module for primitive arrays:

1
2
3
import array
data = array.array('i', range(10_000_000))
# Memory: ~40 MB (7x smaller)

Or use NumPy:

1
2
3
import numpy as np
data = np.arange(10_000_000)
# Memory: ~40 MB + negligible overhead

Embedded Systems

Problem: Python on resource-constrained devices

Python’s memory overhead is prohibitive for microcontrollers with KB of RAM.

Solution: Use MicroPython or CircuitPython (optimized for embedded), or use C/Rust for critical paths.

When the Overhead Doesn’t Matter

I/O-Bound Programs

If your program spends most time waiting for network, disk, or user input, Python’s overhead is negligible:

1
2
3
4
5
# Network request: 100ms
# Python overhead: 0.1ms
# Overhead is 0.1% of total time
import requests
response = requests.get("https://api.example.com")

Business Logic and Glue Code

Most Python code is high-level orchestration:

1
2
3
4
5
6
# Overhead insignificant compared to database/API calls
def process_order(order_id):
    order = db.get_order(order_id)       # 10ms (database)
    payment = charge_card(order.amount)  # 50ms (payment API)
    send_email(order.email)              # 20ms (email service)
    # Python overhead: < 1ms

Rapid Development

Python’s productivity gains often outweigh performance costs:

Faster development (dynamic typing, no compilation)
Easier debugging (runtime introspection)
Rich ecosystem (millions of packages)

Cost-benefit:

Write Python in 1 day vs C in 1 week
Python runs in 100ms vs C in 10ms
If code runs infrequently, 1 day saved » 90ms per execution

Comparing Object Overhead Across Languages

Java: Compromise Between C and Python

Java has both primitives (stack) and objects (heap):

1
2
3
4
5
// Primitive: stack-allocated, no overhead
int x = 42;  // 4 bytes on stack

// Boxed: heap-allocated, object overhead
Integer y = 42;  // 8-byte ref + 16-byte object (header + value)

Java object header (HotSpot JVM):

Mark word: 8 bytes (hash code, GC info, lock state)
Class pointer: 4-8 bytes (compressed oops)
Value: 4 bytes
Padding: align to 8 bytes
Total: 16 bytes

Java’s Integer object (16 bytes) is smaller than Python’s PyLongObject (28 bytes) because:

No explicit reference count (GC manages lifetimes)
No size field (integers are fixed-size)

Go: Stack-First Philosophy

Go aggressively stack-allocates via escape analysis:

1
2
3
4
5
6
7
8
9
func stackInt() {
    x := 42  // Stack: 8 bytes
    fmt.Println(x)
}

func heapInt() *int {
    x := 42
    return &x  // Escapes to heap: 8 bytes + allocation overhead
}

Go has no primitive/object distinction - the compiler decides based on usage.

Rust: Zero-Cost Abstractions

Rust provides control without overhead:

1
2
3
let x: i32 = 42;            // Stack: 4 bytes
let y = Box::new(42);       // Heap: 4 bytes (no metadata)
let z = Rc::new(42);        // Heap: 4 bytes + 16-byte Rc header

Rust’s Box<i32> is just the value on the heap (4 bytes). No reference count unless you use Rc (reference counted) or Arc (atomic reference counted).

Profiling Python Memory Usage

Measuring Object Size

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import sys

x = 42
sys.getsizeof(x)  # 28 bytes

s = "hello"
sys.getsizeof(s)  # 54 bytes (PyUnicode overhead + 5 chars)

lst = [1, 2, 3]
sys.getsizeof(lst)  # 80 bytes (list object itself)
# But this doesn't include the PyLongObjects in the list!

# Total memory for list + elements:
total = sys.getsizeof(lst) + sum(sys.getsizeof(x) for x in lst)
# 80 + (28 * 3) = 164 bytes for 3 integers

Memory Profiling Tools

memory_profiler:

1
2
3
4
5
6
7
8
from memory_profiler import profile

@profile
def create_list():
    return [i for i in range(10000)]

create_list()
# Output shows line-by-line memory usage

tracemalloc (built-in):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import tracemalloc

tracemalloc.start()

# Your code here
data = list(range(100000))

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

for stat in top_stats[:10]:
    print(stat)

pympler:

1
2
3
4
from pympler import asizeof

data = list(range(1000))
asizeof.asizeof(data)  # Deep size (includes referenced objects)

Practical Implications

Choosing the Right Tool

Use Case	Python	NumPy	C Extension	Other Language
Web API	+ Good	- Overkill	- Overkill	Consider Go/Rust
Data processing	- Slow	+ Good	+ Good	Consider Rust
Machine learning	+ Good (with NumPy/PyTorch)	+ Core	+ Core	Julia for research
System tool	- Slow startup	- Overkill	+ Good	Go/Rust better
Scripting	+ Excellent	- Overkill	- Overkill	-

Optimization Strategy

1. Profile first:

1
python -m cProfile script.py

2. Identify bottlenecks:

CPU-bound loops processing numbers
Large collections of small objects
Repeated allocations

3. Optimize selectively:

Use NumPy for numeric arrays
Use array.array for primitive arrays
Move hot paths to C extensions (Cython, ctypes)
Consider Rust/Go for performance-critical services

4. Don’t over-optimize:

Python’s overhead matters in < 10% of code
Premature optimization wastes development time
Profile, then optimize only what matters

The Trade-Off

Python made a conscious choice: developer productivity over raw performance.

What you gain:

Dynamic typing (flexibility)
Everything is an object (uniform interface)
Rich runtime introspection (debugging, metaprogramming)
Automatic memory management (no manual free/delete)
Rapid development (no compilation, simple syntax)

What you pay:

7x memory overhead (vs C)
10-50x slower execution (pure Python vs C)
Values are boxed objects (typically heap-allocated) accessed via references
Pointer indirection and reference counting overhead
GC pauses

For most Python code (web services, data pipelines, scripting), the overhead is acceptable. For performance-critical inner loops, drop down to NumPy, C extensions, or another language.

Best Practice: Write your application in Python. Profile to find bottlenecks. Optimize only the hot paths with NumPy, Cython, or Rust extensions. You get 90% of the development speed with 90% of C’s performance where it matters.

Conclusion

Python’s “everything is an object” design carries a real cost:

28 bytes for a simple integer (vs 4 bytes in C)
Values are boxed objects (typically heap-allocated) accessed via references
Pointer indirection for every value access
Reference counting overhead

But this cost buys Python’s greatest strength: simplicity. No manual memory management. No type declarations. No compilation. A uniform object model that makes metaprogramming trivial.

For the vast majority of Python code - web APIs, data pipelines, glue scripts - this trade-off is worth it. The developer time saved dwarfs the CPU cycles lost.

When performance matters, Python offers escape hatches: NumPy for arrays, Cython for hot loops, ctypes for C libraries. You get the best of both worlds - Python’s productivity where it matters, C’s performance where it matters.

The price of everything being an object? Acceptable for most code, optimizable for performance-critical paths.

The Price of Everything Being an Object in Python

Memory Layout: C vs Python

C Integer Storage

Python Integer Storage

The PyObject Structure

Comparing Across Languages

Integer Storage Comparison

Code Examples

Why Everything is on the Heap

The Design Rationale

The Performance Cost

Allocation Speed

Memory Bandwidth

Cache Performance

Python’s Optimizations

Small Integer Caching

String Interning

Object Pooling (Tuples, Dicts, etc.)

When the Overhead Matters

CPU-Bound Number Crunching

Large Data Structures

Embedded Systems

When the Overhead Doesn’t Matter

I/O-Bound Programs

Business Logic and Glue Code

Rapid Development

Comparing Object Overhead Across Languages

Java: Compromise Between C and Python

Go: Stack-First Philosophy

Rust: Zero-Cost Abstractions

Profiling Python Memory Usage

Measuring Object Size

Memory Profiling Tools

Practical Implications

Choosing the Right Tool

Optimization Strategy

The Trade-Off

Conclusion

Further Reading