Kubernetes Secrets: Should Your Cluster Store Secrets or Just Access Them?
Should your Kubernetes cluster store secrets or just access them? Understanding trust boundaries, blast radius, and the architectural trade-offs between etcd storage and runtime vault access.
- categories
- Architecture Security Kubernetes
- published
A test pod just accessed production database credentials.
The bug wasn’t in application code. It wasn’t in cloud IAM.
It was a Kubernetes RoleBinding - buried, overly broad, and easy to miss.
This happens more often than teams like to admit. Not because Kubernetes RBAC is bad, but because once secrets live in etcd, cluster access becomes secret access. The blast radius is the cluster itself.
That’s the trade-off most teams don’t think about until it bites them.
The fundamental question isn’t “should I use Kubernetes Secrets?” - they’re valid and often the simplest solution. The question is: should your cluster store secrets, or just access them?
There’s no universally correct answer. The choice depends on scale, security requirements, operational model, and team preferences. But understanding the architectural trade-offs - where secrets live, who controls access, what happens when things go wrong - helps you make an informed decision rather than defaulting to the most convenient option.
This article examines three patterns for secret management in Kubernetes: native Kubernetes Secrets (cluster stores secrets), operators with CRDs (sync from external vault to cluster), and runtime APIs (cluster accesses secrets without storage). We’ll analyze trust boundaries, blast radius, operational complexity, and when each pattern makes sense.
What This Article Covers
An architectural deep-dive into Kubernetes secret management patterns: native Kubernetes Secrets (cluster stores secrets), operators with CRDs (sync from external vault), and runtime APIs (cluster accesses secrets without storage). We’ll examine trust boundaries, blast radius, and decision frameworks.
Key Terms
- etcd: Kubernetes’ backing store where cluster state (including Secrets) is stored
- Operators: Kubernetes controllers that extend the API with custom resources
- ESO (External Secrets Operator): Syncs secrets from external vaults into Kubernetes Secrets
- Trust boundary: Where access control is enforced (cluster RBAC vs cloud IAM)
- Blast radius: Scope of impact when access controls are breached or misconfigured
- IRSA: AWS IAM Roles for Service Accounts (GCP: Workload Identity, Azure: Managed Identity)
Understanding the Three Patterns
Before diving deep, here’s the landscape:
Pattern 1: Kubernetes Secrets - Store secrets directly in etcd. Simple, declarative, GitOps-friendly. The cluster becomes compute + secret storage. Access controlled by cluster RBAC.
Pattern 2: Operators (ESO) - Sync secrets from external vaults (AWS, GCP, Azure) into Kubernetes Secrets. You get central vault management but secrets still end up in etcd. Two sources of truth, eventual consistency.
Pattern 3: Runtime Access - Keep secrets in cloud vaults only, fetch at runtime via HTTP API. Cluster is just compute. Access controlled by cloud IAM, not cluster RBAC. No secrets in etcd.
The core trade-off: where do secrets live, and who controls access to them?
Let’s examine each pattern in detail.
Pattern 1: Native Kubernetes Secrets
Let’s examine the first pattern in detail: storing secrets directly in Kubernetes.
Architecture
Lifecycle:
- Secret created via
kubectlor YAML manifest - Stored in etcd (base64-encoded or encrypted-at-rest)
- Pod references secret in spec
- kubelet fetches secret from API server
- Mounts secret as file or sets as env var
- Application reads secret
What You’re Depending On
When you use Kubernetes Secrets, you’re depending on etcd security (encryption at rest, network encryption, access controls), cluster RBAC policies (who can get secrets, namespace isolation), and operational procedures (rotation, backup security, cluster migrations that include secrets). Your security posture is tied to cluster security posture.
When This Works Well
Scenario 1: Small trusted teams
Team size: 5-10 engineers
All have production access anyway
Secret sharing is necessary for collaboration
Complexity > value of strict isolation
Scenario 2: Single-tenant clusters
One cluster per environment (dev, staging, prod)
Separate clusters = separate blast radii
prod cluster is tightly controlled
Scenario 3: Low-security applications
Secrets are internal service tokens
Not customer data or credentials
Breach impact is limited
When Teams Get Uncomfortable
As scale increases, the architectural consequences become harder to ignore. At 10 namespaces with 50 secrets each, you have 500 secrets in etcd. Any cluster admin can read all 500. Any RBAC misconfiguration potentially exposes all.
Cluster coupling becomes operational burden: migrating clusters means migrating secrets, backups must secure secrets, restores must handle secret restoration. Auditing becomes fragmented: who accessed which secret requires checking Kubernetes audit logs for pod access, etcd logs if enabled, with no cloud provider audit trail to cross-reference.
At 100+ namespaces and 50+ engineers, many teams start looking for alternatives.
Pattern 2: Operators - Syncing External Vaults
The operator pattern attempts to solve Kubernetes Secrets’ limitations by syncing from external vaults.
External Secrets Operator Architecture
Reconciliation loop:
- User creates
ExternalSecretCRD - ESO controller watches for CRD changes
- Controller fetches secret from external vault
- Controller creates/updates Kubernetes Secret in etcd
- Application consumes Kubernetes Secret (doesn’t know about ESO)
- Controller polls external vault periodically (e.g., every 5 minutes)
- On change, updates Kubernetes Secret
What ESO Solves
1. Centralized management
- Secrets live in cloud vault (AWS/GCP/Azure native)
- Same vault for Kubernetes and non-Kubernetes workloads
- Cloud provider audit logs (who accessed what)
2. Automatic rotation
- Poll interval (ESO checks for changes)
- Secrets update automatically in pods
- No manual kubectl operations
3. GitOps friendly
- ExternalSecret CRDs in Git
- Secret metadata versioned (not values)
- Declarative secret management
What ESO Doesn’t Solve
The cluster is still part of your secret lifecycle.
After ESO syncs, secrets live in etcd. Everything from the Kubernetes Secrets section still applies:
- Cluster RBAC controls access
- etcd contains secrets (encrypted or not)
- Cluster backup/restore must handle secrets
- Blast radius: cluster access = secret access
Plus, you’ve added complexity: Two sources of truth means AWS Secrets Manager might say password123 while the Kubernetes Secret still has password-old. ESO syncs every 5 minutes, so they’ll converge eventually, but for 5 minutes they differ. Which is correct?
Sync loop failures create staleness: ESO pod crashes and sync stops, credentials expire and sync fails, network partitions leave the cluster with old secrets while the vault has new ones.
And there’s operational confusion: the source of truth is AWS Secrets Manager, but you can also kubectl edit secret db-creds. ESO overwrites your change on the next sync. Which system should you use?
CRDs as Control Plane State Injection
Here’s the fundamental architectural issue:
CRDs inject external state into the Kubernetes control plane. The data lives in etcd, the operator reconciles it.
This is powerful for Kubernetes-native resources (Deployments, Services). But for secrets, it means:
- Kubernetes API becomes part of secret access path
- etcd becomes secret storage (even if “just a cache”)
- Control plane is now coupled to secret lifecycle
For some teams, this is fine. For others, it feels wrong - why should the cluster be involved in secret storage at all?
Pattern 3: Runtime Access - Separating Compute from State
The third pattern takes a different approach: keep secrets outside cluster state entirely.
The Architecture
Instead of storing secrets in etcd, applications fetch secrets at runtime from external vaults. The cluster is just compute; secrets live only in the vault.
No etcd. No sync. No reconciliation.
Secrets are fetched when needed, not stored for later.
How This Works: The Sidecar Pattern
Applications can’t call AWS/GCP/Azure APIs directly (requires SDK, authentication, backend-specific logic). Instead, run a sidecar container that provides an HTTP API for secret access.
This is where a runtime API server becomes necessary. The examples below use vaultmux-server , which wraps the vaultmux library to provide a unified HTTP interface for AWS Secrets Manager, GCP Secret Manager, and Azure Key Vault. The pattern works with any similar implementation.
Deployment:
| |
Application code (Python):
| |
Same in Java:
| |
Same in Node.js:
| |
One HTTP endpoint, any language. No SDK dependencies.
Namespace Isolation via Cloud IAM
Here’s where the trust boundary shifts.
Service account mapping:
test namespace pod uses test-vaultmux-sa
↓
Kubernetes service account annotated with IAM role ARN
↓
AWS IRSA maps service account to test-secrets-role
↓
IAM role policy allows access to test/* secrets only
↓
AWS Secrets Manager enforces policy at API level
What happens when test pod tries to access prod secret:
| |
The cluster RBAC doesn’t matter. Even if Kubernetes RBAC grants the test pod permission to access any service, AWS IAM still denies access to prod secrets.
The Trust Model
What you’re trusting:
- Cloud provider IAM enforcement (AWS, GCP, Azure)
- Service account to IAM mapping (IRSA, Workload Identity, Managed Identity)
- Sidecar implementation (vaultmux-server or similar)
What you’re NOT trusting:
- Cluster RBAC configuration (can be misconfigured without exposing secrets)
- etcd security (secrets never stored there)
- Cluster backup security (no secrets in backups)
The result: Hard isolation at the cloud boundary, not “best effort” isolation inside Kubernetes.
What You Gain
You get a single source of truth: secrets live in AWS Secrets Manager (or GCP, Azure), cached nowhere, controlled by cloud IAM. No sync loop, no eventual consistency, no “which copy is correct?”
Blast radius shrinks: cluster admin access lets you deploy pods, read logs, exec into containers, but you cannot read secrets without matching IAM policy. Cluster compromise doesn’t automatically mean secret compromise.
Cluster lifecycle decouples: backups contain no secrets, restores don’t need secret restoration, migrations just point the new cluster at the same vault. Secrets and compute are separate systems.
Cloud-native audit trails become definitive: who accessed prod/database-password at 10:05:23? Check AWS CloudTrail. Not buried in Kubernetes audit logs. Tamper-proof audit trail outside the cluster.
What You Lose
The runtime pattern isn’t declarative: secrets aren’t in Git, you can’t see “what secrets exist” from YAML files, and it’s less GitOps-friendly.
You add runtime dependency: applications must make HTTP requests on startup (network hop even to localhost sidecar), and if the sidecar fails, the app can’t start.
Setup complexity increases: you must configure IAM roles per namespace, set up service account annotations, and understand cloud provider IAM models.
You lose Kubernetes-native consumption: no volumeMounts for secrets, no envFrom secretRefs, and you must write code to fetch secrets via HTTP.
Comparing All Three Patterns
| Aspect | K8s Secrets | Operators (ESO) | Runtime API (Sidecar) |
|---|---|---|---|
| Where secrets live | etcd | etcd (synced from vault) | Vault only |
| Trust boundary | Cluster RBAC | Cluster RBAC | Cloud IAM |
| Source of truth | etcd | Vault (with etcd cache) | Vault |
| Blast radius | Entire cluster | Entire cluster | Scoped to IAM policy |
| RBAC misconfiguration | Exposes all secrets | Exposes all secrets | No secret exposure |
| Secret rotation | Manual | Automatic (poll) | Automatic (always latest) |
| Declarative | Yes | Yes | No |
| GitOps friendly | Yes | Yes (metadata only) | No |
| Cluster coupling | High | High | Low |
| Setup complexity | Low | Medium | Medium-High |
| Language requirements | None | None | HTTP client |
| Works outside K8s | No | No | Yes |
| Audit trail | K8s audit logs | K8s + cloud logs | Cloud logs only |
When Separation Doesn’t Matter
Before advocating for runtime patterns, let’s acknowledge when Kubernetes Secrets are perfectly fine.
Small Scale (< 50 engineers, < 100 pods)
Reality check:
- Everyone with production access is trusted
- RBAC is manageable (few roles, few bindings)
- Blast radius is acceptable (limited team size)
- Operational simplicity > security paranoia
At this scale, separating compute from secret state is often premature optimization. The complexity of IAM role management exceeds the security benefit.
Use Kubernetes Secrets. Focus on building your product, not over-engineering infrastructure.
Homogeneous Environments
When you have:
- One language (all Go microservices)
- One cloud provider (all AWS)
- Native SDK usage (already using AWS SDK)
Then:
- Polyglot problem doesn’t exist
- Runtime API adds overhead without value
- Use native SDKs with IAM roles directly
Use Kubernetes Secrets or native SDKs. Runtime APIs solve a polyglot problem you don’t have.
Acceptable Cluster Trust
When your threat model allows:
- Cluster administrators are part of security team
- Auditing cluster access is sufficient
- Secrets in backups are acceptable
Then:
- Cluster as secret storage is architecturally sound
- Blast radius is managed via personnel trust
- Operational simplicity wins
Use Kubernetes Secrets or ESO. Not every team needs cloud boundary isolation.
The Runtime Pattern in Practice
Let’s examine how the runtime pattern works in production with real implementation details.
Sidecar Deployment with Cloud IAM
AWS Example: IRSA (IAM Roles for Service Accounts)
Step 1: Create IAM policy
| |
Step 2: Create IAM role with trust policy
| |
This trust policy allows the Kubernetes service account prod-vaultmux-sa in namespace prod to assume the IAM role.
Step 3: Create Kubernetes service account
| |
Step 4: Deploy pod with sidecar
| |
Step 5: Application fetches secrets
| |
What just happened:
- Application makes HTTP request to localhost:8080 (sidecar)
- Sidecar uses service account credentials (via IRSA)
- Sidecar calls AWS Secrets Manager with IAM role
- AWS enforces IAM policy (only prod/* secrets allowed)
- Secret returned to application
- Secret never stored in etcd
GCP and Azure Work Similarly
GCP Workload Identity uses iam.gke.io/gcp-service-account annotation, Azure Managed Identity uses aadpodidbinding selector. All three clouds follow the same model: service account → cloud identity → vault, enforced by the cloud provider. See the
complete setup guide
for GCP and Azure configuration.
The Shared Service Alternative
The sidecar pattern (one vaultmux-server per pod) provides maximum isolation but high resource usage. The shared service pattern trades isolation for efficiency.
Shared Service Architecture
Deployment:
- 2-3 replicas of vaultmux-server
- Kubernetes Service for load balancing
- All applications call same endpoint
Trade-offs:
Resource usage:
- Sidecar: 1 vaultmux-server per application pod (50 apps = 50 sidecars)
- Shared service: 2-3 total replicas (50 apps = 2-3 sidecars)
Isolation:
- Sidecar: Each namespace uses different IAM role (namespace = security boundary)
- Shared service: All pods use shared IAM role (network isolation only)
Latency:
- Sidecar: ~1ms (localhost)
- Shared service: ~5-10ms (in-cluster network)
Security:
- Sidecar: Cloud IAM enforces per-namespace boundaries
- Shared service: Relies on network isolation (any pod can call API)
Recommendation: Sidecar for multi-tenant production (hard isolation), shared service for dev/test or single-tenant environments.
Decision Framework: Which Pattern Should You Use?
Start Here: What Are Your Requirements?
Question 1: Do you need multi-tenant namespace isolation?
- Yes → Sidecar + IAM (hard boundary) or Operators with careful RBAC
- No → Any pattern works
Question 2: Can secrets live in etcd?
- Yes → Kubernetes Secrets or Operators
- No (security requirement) → Runtime API
Question 3: Do you need declarative management?
- Yes (GitOps) → Kubernetes Secrets or Operators
- No → Runtime API
Question 4: Is operational simplicity critical?
- Yes → Kubernetes Secrets (simplest)
- No (willing to invest in setup) → Operators or Runtime API
Question 5: Do you have polyglot teams?
- Yes (Python, Java, Node.js, Go, Rust) → Runtime API (no SDKs)
- No (single language) → Any pattern
Decision Tree
live in etcd?} declarative{Need
declarative?} polyglot{Polyglot
teams?} k8s[Kubernetes Secrets] eso[External Secrets
Operator] sidecar[Runtime API
Sidecar + IAM] sdk[Native SDKs
+ IAM roles] start --> scale scale -->|< 50 pods| k8s scale -->|> 50 pods| tenant tenant -->|No| etcd tenant -->|Yes| etcd etcd -->|Yes| declarative etcd -->|No| sidecar declarative -->|Yes| eso declarative -->|No| polyglot polyglot -->|Yes| sidecar polyglot -->|No| sdk style k8s fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style eso fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style sidecar fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style sdk fill:#3A4C43,stroke:#6b7280,color:#f0f0f0
Scenario-Based Recommendations
Scenario 1: Early-stage startup (5 engineers, 1 cluster)
Use: Kubernetes Secrets
Why: Simplicity wins. You’re moving fast, team is small and trusted, RBAC is manageable. Don’t over-engineer.
Scenario 2: Mid-size company (30 engineers, multiple namespaces)
Use: External Secrets Operator
Why: Centralized secret management (AWS Secrets Manager) with automatic sync. Declarative (GitOps), native K8s consumption, automatic rotation. Team is large enough that centralization matters.
Scenario 3: Large enterprise (200+ engineers, 50+ namespaces, polyglot)
Use: Runtime API with sidecar + IAM
Why: Multi-tenant isolation is critical, polyglot teams don’t want SDK sprawl, blast radius must be minimized. Willing to invest in IAM role setup for security gains.
Scenario 4: High-security / regulated industry (finance, healthcare)
Use: Runtime API with sidecar + IAM
Why: Secrets cannot live in etcd (regulatory requirement). Cloud IAM provides audit trail. Cluster compromise doesn’t automatically expose secrets.
Scenario 5: Hybrid - static config + dynamic secrets
Use: Both Kubernetes Secrets and Runtime API
Why:
- Static config (database URLs, service endpoints) → K8s Secrets (rarely change)
- Dynamic secrets (API keys, tokens) → Runtime API (fetch on-demand)
- Optimize for convenience where it matters, security where it’s critical
Runtime Pattern Implementation Details
The runtime pattern requires an HTTP API server that applications can call. This server handles the complexity of authenticating with different cloud providers and fetching secrets on demand.
Example: Multi-Tenant Production Deployment
Namespace setup:
Test namespace:
| |
Prod namespace:
| |
IAM policies:
test-secrets-role can access:
| |
prod-secrets-role can access:
| |
Isolation enforced by AWS, not cluster RBAC.
Polyglot Access
All languages use the same HTTP endpoint. Python: requests.get('http://localhost:8080/v1/secrets/name').json()['value'], Java: HttpClient with JSON parsing, Node.js: fetch() with await response.json(), Go: http.Get() with json.Decoder.
Zero SDK dependencies. One API. Any language.
Backend Switching
Development: use pass (local, no cloud)
| |
Staging: use GCP
| |
Production: use AWS
| |
Application code unchanged. Backend configuration determines where secrets come from.
This flexibility is what makes the runtime pattern portable across environments. The same application code works in development (using local pass), staging (using GCP), and production (using AWS) - only the backend configuration changes.
Why Runtime APIs Instead of Operators?
Runtime API servers like vaultmux-server are intentionally not operators. Operators inject external state into the Kubernetes control plane - data lives in etcd, operators reconcile it.
Runtime APIs take the opposite approach: keep secrets outside cluster state entirely. Kubernetes becomes just one runtime among many (VMs, CI, local development), not the system of record.
The architectural difference:
Operator pattern:
K8s API → etcd → operator → external vault
Secrets stored in cluster, declarative reconciliation
Runtime pattern:
App → HTTP API → external vault
No reconciliation, no cluster storage, runtime fetching only
They’re complementary. Use operators for declarative sync, runtime APIs for on-demand access without etcd storage. Complete setup guides for AWS, GCP, and Azure are available in the repository.
Hybrid Approaches: Using Multiple Patterns
Most production systems don’t use a single pattern - they combine them based on use case.
Static Config → Kubernetes Secrets
What qualifies as static config:
- Database connection strings (rarely change)
- Service endpoints (stable)
- Feature flags (low-security)
- Public API keys (not sensitive)
Why Kubernetes Secrets work here:
- Simple consumption (volume mounts, env vars)
- Declarative management (Git-tracked YAML)
- Rotation frequency is low (manual updates are fine)
Example:
| |
Dynamic Secrets → Runtime API
What qualifies as dynamic:
- Database passwords (rotate frequently)
- API tokens (expire and refresh)
- Certificates (short-lived)
- OAuth credentials (dynamic grant)
Why runtime access works here:
- Always fetch latest (no sync lag)
- No stale secrets in etcd
- Rotation happens at vault level
- Application always gets current value
Example:
| |
High-Security Secrets → Sidecar + IAM
What qualifies as high-security:
- Customer PII encryption keys
- Payment processing credentials
- Admin access tokens
- Cross-service authentication secrets
Why sidecar + IAM works here:
- Hard isolation via cloud boundary
- Audit trail in cloud provider logs
- Blast radius limited to IAM policy scope
- Never stored in cluster
Example:
| |
Decision Matrix
| Secret Type | Pattern | Why |
|---|---|---|
| Database URL | K8s Secret | Static, low-security, simple |
| Database password | Runtime API | Dynamic, rotate frequently |
| Service endpoint | K8s Secret | Static, declarative |
| API token | Runtime API | Expires, refresh needed |
| Feature flags | K8s Secret or ConfigMap | Not secret, declarative |
| Encryption keys | Runtime API + IAM | High-security, audit required |
| OAuth client ID | K8s Secret | Public-ish, static |
| OAuth client secret | Runtime API | Sensitive, rotate regularly |
Operational Considerations
Secret Rotation
Kubernetes Secrets:
- Manual:
kubectl create secret --from-literal(overwrite) - Or: Update YAML manifest, reapply
- Pods don’t auto-reload (must restart or watch for changes)
Operators (ESO):
- Automatic: ESO polls vault, updates K8s Secret
- Poll interval (e.g., every 5 minutes)
- Pods reload on Secret change (if watching)
Runtime API:
- Automatic: Every request fetches latest from vault
- No sync lag, always current
- No pod restarts needed
Audit and Compliance
Kubernetes Secrets:
- K8s audit logs (who accessed which Secret resource)
- etcd access logs (if enabled)
- Audit trail is cluster-specific
Operators (ESO):
- K8s audit logs (CRD operations)
- Cloud provider logs (vault access)
- Two separate audit trails to correlate
Runtime API:
- Cloud provider logs only (CloudTrail, Cloud Audit Logs, Azure Monitor)
- Direct API access = single audit trail
- Tamper-proof (logs outside cluster)
Disaster Recovery
Kubernetes Secrets:
- Secrets in cluster backups
- Backup security = secret security
- Restore includes secrets
Operators (ESO):
- CRDs in cluster backups (metadata only)
- Secrets in vault (separate backup)
- Restore: CRDs recreated, ESO syncs from vault
Runtime API:
- No secrets in cluster backups
- Secrets only in vault backups
- Restore: Pods start, fetch from vault
Cost
Kubernetes Secrets:
- Free (native K8s resource)
- etcd storage (negligible)
Operators (ESO):
- Operator pod resources (minimal: ~100MB memory)
- Cloud vault costs (AWS/GCP/Azure secret storage + API calls)
Runtime API (Sidecar):
- Sidecar resources (50-100MB per pod)
- 50 pods × 100MB = 5GB memory overhead
- Cloud vault costs (same as operators)
Runtime API (Shared Service):
- 2-3 replicas (200-300MB total)
- Lower resource usage than sidecar
- Cloud vault costs (same as operators)
Conclusion
Kubernetes Secrets are simple and often sufficient. But at scale or in high-security environments, some teams separate compute from secret storage.
The fundamental question: Should your cluster store secrets, or just access them?
The patterns:
- Kubernetes Secrets: Cluster is compute + storage (simple, declarative, limited isolation)
- Operators (ESO): Cluster stores synced copies (declarative, dual source-of-truth, etcd dependency remains)
- Runtime API (Sidecar): Cluster is just compute (cloud IAM boundary, single source-of-truth, setup complexity)
No universal answer. The choice depends on:
- Scale (small = simplicity wins, large = isolation matters)
- Security requirements (regulated = separate state, startup = pragmatic)
- Operational model (GitOps = declarative, dynamic = runtime)
- Team preferences (trust cluster RBAC vs trust cloud IAM)
For most teams: Start with Kubernetes Secrets. When you outgrow them (scale, security requirements, operational complexity), you’ll know it’s time to separate compute from state.
For platform teams: Offer multiple patterns. Let application teams choose based on their security needs. Static config can live in K8s Secrets while high-security credentials use runtime access.
The best architecture acknowledges trade-offs and chooses deliberately, not by default.
Further Reading
Official Documentation: Kubernetes Secrets , AWS IRSA , GCP Workload Identity , Azure Workload Identity
Tools: External Secrets Operator , vaultmux-server , Sealed Secrets
Have questions about Kubernetes secret management patterns? Open an issue or reach out on LinkedIn .