The Price of Everything Being an Object in Python
Why Python stores everything on the heap: examining the memory overhead of Python's object model, comparing int storage across languages, and understanding the performance implications
- tags
- #Python #Memory-Management #Performance #Cpython #Internals #Optimization #Heap #Stack #Data-Structures #Object-Model #Memory-Layout #Pyobject #Reference-Counting #Garbage-Collection #C #Go #Rust #Java #Comparison #Systems-Programming #Profiling
- categories
- Python Performance
- published
- reading time
- 14 minutes
All Python developers know that everything in Python is an object. Numbers are objects. Strings are objects. Functions are objects. Even None is an object.
But at what cost?
This design decision has profound implications for memory usage and performance. In typical C code, an integer is stored inline (often on the stack or in registers) with no metadata - just 4 bytes. In Python, that same integer is a 28-byte object on the heap, accessed through pointer indirection. This article explores why Python made this choice, what the overhead looks like in practice, and when it matters.
Note: Sizes shown are for 64-bit CPython builds; exact layout varies by platform and build configuration.
Memory Layout: C vs Python
C Integer Storage
In typical C code, integers are stored inline without metadata:
| |
For local variables, the typical layout is:
Memory layout (automatic/stack storage):
Stack:
┌──────────┐
│ 00000000 │
│ 00000000 │
│ 00000000 │
│ 00101010 │ ← 42 in binary
└──────────┘
Total: 4 bytes
Location: Stack (or register)
Allocation: Instant (bump stack pointer)
The CPU can directly operate on this value. No indirection, no metadata, no dynamic allocation.
Python Integer Storage
In Python, the same integer is an object on the heap:
| |
Memory layout (CPython 3.11+):
Stack:
┌─────────────┐
│ 0x7f8a3c... │ ← Pointer to PyObject (8 bytes)
└─────────────┘
Heap:
┌─────────────────────┐
│ Reference Count (8) │ ← How many references to this object
├─────────────────────┤
│ Type Pointer (8) │ ← Points to PyLong_Type
├─────────────────────┤
│ Size (8) │ ← Number of digits (for arbitrary precision)
├─────────────────────┤
│ Value (4) │ ← Actual integer value: 42
└─────────────────────┘
Total: 28 bytes
Location: Heap
Allocation: CPython allocator (pymalloc), refcount initialization
Seven times larger. And this doesn’t include the pointer on the stack (8 bytes) that references this object.
The PyObject Structure
Every Python object starts with a PyObject header:
| |
For integers specifically (PyLongObject):
| |
Breakdown for x = 42:
- Reference count: 8 bytes
- Type pointer: 8 bytes
- Size field: 8 bytes
- Value: 4 bytes
- Total: 28 bytes
Compare this to C:
- Value: 4 bytes
- Total: 4 bytes
Direct value"] c_var --> c_mem end subgraph py["Python Integer (Heap)"] py_var[x = 42] py_ptr["Stack: 8-byte pointer"] py_obj["Heap: 28-byte PyLongObject"] py_refcnt["Refcount: 8 bytes"] py_type["Type ptr: 8 bytes"] py_size["Size: 8 bytes"] py_val["Value: 4 bytes"] py_var --> py_ptr py_ptr -.-> py_obj py_obj --> py_refcnt py_obj --> py_type py_obj --> py_size py_obj --> py_val end style c fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0
Comparing Across Languages
Integer Storage Comparison
| Language | Storage Location | Size (bytes) | Metadata | Allocation |
|---|---|---|---|---|
| C | Inline (stack/register) | 4 | None | Instant |
| Go | Inline (escape analysis) | 8 | None (stack); GC metadata if escaped | Stack or heap (escape analysis) |
| Rust | Inline (stack) | 4 or 8 | None | Instant |
| Java | Stack (primitive) | 4 | None | Instant |
| Java | Heap (Integer object) | ~16 (typical HotSpot) | Object header (~12) + value (4) | Allocator |
| Python | Heap (always boxed) | 28 | Refcount (8) + type (8) + size (8) + value (4) | pymalloc |
Code Examples
C:
| |
Go:
| |
Rust:
| |
Java:
| |
Python:
| |
Why Everything is on the Heap
The Design Rationale
Python’s creators made a deliberate choice: simplicity and flexibility over raw performance.
1. Dynamic Typing:
In C, the compiler knows types at compile time:
| |
In Python, types are determined at runtime:
| |
The variable x is just a name bound to an object. The object carries its own type information. This requires objects to be heap-allocated with metadata.
2. Everything is a Reference:
Python variables are not values - they’re references to objects:
| |
Compare to C:
| |
This reference model requires heap allocation so objects can be shared across scopes.
3. Garbage Collection:
Python uses reference counting (and cyclic GC) to manage memory. Every object needs a reference count:
| |
This requires every value to be an object with a reference count field.
4. Uniform Object Interface:
Every Python object has a consistent interface:
| |
Even integers have methods:
| |
This requires integers to be full objects with type information and method tables.
The Performance Cost
Allocation Speed
Benchmark: Creating 1 million integers
| |
| |
50x slower. Most of this overhead is heap allocation and reference counting.
Note: Exact timings vary by platform, Python version, and allocator behavior; the ratios shown are illustrative of typical overhead.
Memory Bandwidth
Array of 1 million integers:
| Language | Memory Used | Notes |
|---|---|---|
| C | 4 MB | Contiguous array on heap |
| Go | 8 MB | Contiguous array |
| Rust | 4 MB | Contiguous Vec<i32> |
| Java | 4 MB + overhead | Primitive array, contiguous |
| Python | ~36-40+ MB | List pointers (~8MB) + per-int objects (~28-32MB) + allocator overhead |
Python uses 7x more memory than C for the same logical data.
Cache Performance
Modern CPUs rely on cache locality. Contiguous, inline data benefits from:
- Sequential access (contiguous memory layout)
- Small size (fits in L1 cache: 32-64 KB)
- Prefetching (CPU predicts access patterns)
Heap-allocated Python objects suffer from:
- Scattered allocation (objects not contiguous)
- Large size (cache misses)
- Pointer chasing (follow reference to find value)
Example:
| |
| |
28 bytes"] py_obj2["PyLong
28 bytes"] py_obj3["PyLong
28 bytes"] py_arr --> py_ptr1 py_arr --> py_ptr2 py_arr --> py_ptr3 py_ptr1 -.-> py_obj1 py_ptr2 -.-> py_obj2 py_ptr3 -.-> py_obj3 end style c_array fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style py_list fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0
Python’s Optimizations
Python doesn’t leave performance entirely on the table. CPython includes several optimizations:
Small Integer Caching
Python pre-allocates integers from -5 to 256:
| |
Why: Small integers are so common that pre-allocating them saves repeated heap allocations.
Implementation:
| |
When you create x = 10, Python returns a pointer to the cached object instead of allocating a new one.
String Interning
String literals are automatically interned:
| |
Runtime strings can be manually interned:
| |
Object Pooling (Tuples, Dicts, etc.)
CPython maintains free lists for frequently used types:
- Tuples: up to 20 tuples per size (up to size 20)
- Dicts: 80 dict objects
- Lists: 80 list objects
- Floats: 100 float objects
When you delete these objects, they’re returned to the pool instead of being freed. Next allocation reuses them.
When the Overhead Matters
CPU-Bound Number Crunching
Problem: Tight loops processing millions of numbers
| |
Solution: Use NumPy (C arrays under the hood):
| |
Large Data Structures
Problem: Storing millions of small objects
| |
Solution: Use array module for primitive arrays:
| |
Or use NumPy:
| |
Embedded Systems
Problem: Python on resource-constrained devices
Python’s memory overhead is prohibitive for microcontrollers with KB of RAM.
Solution: Use MicroPython or CircuitPython (optimized for embedded), or use C/Rust for critical paths.
When the Overhead Doesn’t Matter
I/O-Bound Programs
If your program spends most time waiting for network, disk, or user input, Python’s overhead is negligible:
| |
Business Logic and Glue Code
Most Python code is high-level orchestration:
| |
Rapid Development
Python’s productivity gains often outweigh performance costs:
- Faster development (dynamic typing, no compilation)
- Easier debugging (runtime introspection)
- Rich ecosystem (millions of packages)
Cost-benefit:
- Write Python in 1 day vs C in 1 week
- Python runs in 100ms vs C in 10ms
- If code runs infrequently, 1 day saved » 90ms per execution
Comparing Object Overhead Across Languages
Java: Compromise Between C and Python
Java has both primitives (stack) and objects (heap):
| |
Java object header (HotSpot JVM):
- Mark word: 8 bytes (hash code, GC info, lock state)
- Class pointer: 4-8 bytes (compressed oops)
- Value: 4 bytes
- Padding: align to 8 bytes
- Total: 16 bytes
Java’s Integer object (16 bytes) is smaller than Python’s PyLongObject (28 bytes) because:
- No explicit reference count (GC manages lifetimes)
- No size field (integers are fixed-size)
Go: Stack-First Philosophy
Go aggressively stack-allocates via escape analysis:
| |
Go has no primitive/object distinction - the compiler decides based on usage.
Rust: Zero-Cost Abstractions
Rust provides control without overhead:
| |
Rust’s Box<i32> is just the value on the heap (4 bytes). No reference count unless you use Rc (reference counted) or Arc (atomic reference counted).
Profiling Python Memory Usage
Measuring Object Size
| |
Memory Profiling Tools
memory_profiler:
| |
tracemalloc (built-in):
| |
pympler:
| |
Practical Implications
Choosing the Right Tool
| Use Case | Python | NumPy | C Extension | Other Language |
|---|---|---|---|---|
| Web API | + Good | - Overkill | - Overkill | Consider Go/Rust |
| Data processing | - Slow | + Good | + Good | Consider Rust |
| Machine learning | + Good (with NumPy/PyTorch) | + Core | + Core | Julia for research |
| System tool | - Slow startup | - Overkill | + Good | Go/Rust better |
| Scripting | + Excellent | - Overkill | - Overkill | - |
Optimization Strategy
1. Profile first:
| |
2. Identify bottlenecks:
- CPU-bound loops processing numbers
- Large collections of small objects
- Repeated allocations
3. Optimize selectively:
- Use NumPy for numeric arrays
- Use
array.arrayfor primitive arrays - Move hot paths to C extensions (Cython, ctypes)
- Consider Rust/Go for performance-critical services
4. Don’t over-optimize:
- Python’s overhead matters in < 10% of code
- Premature optimization wastes development time
- Profile, then optimize only what matters
The Trade-Off
Python made a conscious choice: developer productivity over raw performance.
What you gain:
- Dynamic typing (flexibility)
- Everything is an object (uniform interface)
- Rich runtime introspection (debugging, metaprogramming)
- Automatic memory management (no manual free/delete)
- Rapid development (no compilation, simple syntax)
What you pay:
- 7x memory overhead (vs C)
- 10-50x slower execution (pure Python vs C)
- Values are boxed objects (typically heap-allocated) accessed via references
- Pointer indirection and reference counting overhead
- GC pauses
For most Python code (web services, data pipelines, scripting), the overhead is acceptable. For performance-critical inner loops, drop down to NumPy, C extensions, or another language.
Conclusion
Python’s “everything is an object” design carries a real cost:
- 28 bytes for a simple integer (vs 4 bytes in C)
- Values are boxed objects (typically heap-allocated) accessed via references
- Pointer indirection for every value access
- Reference counting overhead
But this cost buys Python’s greatest strength: simplicity. No manual memory management. No type declarations. No compilation. A uniform object model that makes metaprogramming trivial.
For the vast majority of Python code - web APIs, data pipelines, glue scripts - this trade-off is worth it. The developer time saved dwarfs the CPU cycles lost.
When performance matters, Python offers escape hatches: NumPy for arrays, Cython for hot loops, ctypes for C libraries. You get the best of both worlds - Python’s productivity where it matters, C’s performance where it matters.
The price of everything being an object? Acceptable for most code, optimizable for performance-critical paths.
Further Reading
CPython Internals:
- CPython source code
- Objects/longobject.c - Integer implementation
- Include/object.h - PyObject definition
Performance:
Memory Management: