blog
Beyond Binary Thinking: A Graduated Fidelity Framework for Kubernetes Security Testing
This post was created by my multi-agent organizational system, cosim: the characters are fictional, the outputs are hopefully directionally true, and the platform is described in CoSim: Building a Company Out of AI Agents.
The velocity trap. Engineering teams face a persistent dilemma: comprehensive security testing requires production-like infrastructure, but production-like infrastructure is expensive and slow. The result is predictable. Teams choose between velocity, testing locally in lightweight environments, and confidence, testing in production-replica clusters. This binary choice costs them both.
What if the premise is wrong?
Recent research into security testing patterns for AI and ML platforms reveals that security testing exists on a graduated fidelity spectrum, not as binary choices between real and fake environments. Understanding this spectrum unlocks a 10x velocity improvement while maintaining comprehensive coverage, if teams know which tier to use for which tests.
The Problem: False Binary in Security Testing
Engineering teams typically frame security testing as a binary decision:
Option A: Test locally using kind, k3s, or minikube
✅ Fast feedback, in minutes
✅ Zero infrastructure cost
❌ “Not real enough” for production confidence
Option B: Test in production-like clusters using full OpenShift or enterprise Kubernetes
✅ High fidelity to production
✅ Organizational confidence in results
❌ Days of wait time for centralized resources
❌ $500-2,000 per month per cluster
This framing creates a false trade-off: velocity or confidence, pick one.
The reality is more nuanced. Different security tests have different fidelity requirements. RBAC policy validation does not need a production cluster. Container escape testing does. Conflating all security testing into a single fidelity requirement wastes both time and money.
The Framework: Seven Tiers of Testing Fidelity
Security testing environments exist on a graduated spectrum from static analysis to full production, each with distinct cost, time, and coverage characteristics:
Tier 0: Static Analysis
Setup: Seconds
Cost: $0
Coverage: ~40%, including manifest linting, CVE scanning, and policy-as-code
Use Case: Pre-commit validation and IDE integration
What works: Dockerfile security linting, YAML manifest validation, SBOM generation, dependency CVE scanning
Limitations: No runtime behavior validation and no API interaction testing
Tier 1: kind, Kubernetes in Docker
Setup: 5 minutes
Cost: $0
Coverage: 70-80%, including Kubernetes API, RBAC, and NetworkPolicy
Use Case: CI and CD gates, developer local validation
What works:
- RBAC policy validation at roughly 95% fidelity to production
- NetworkPolicy enforcement testing at roughly 90% fidelity with Calico
- CIS Kubernetes Benchmark compliance at roughly 85%
- Admission controller logic testing at roughly 92%
- API authorization testing
- Gateway API configuration validation
Limitations:
- No kernel-level security testing, with container escapes at effectively 0% fidelity
- No SELinux enforcement, only simulation
- No meaningful runtime monitoring with Falco, at roughly 10% syscall visibility due to Docker-in-Docker abstraction
- No platform-specific features such as OpenShift SCCs or enterprise operators
Critical insight: This is the developer velocity unlock. Teams waiting days for nightly builds can validate RBAC, NetworkPolicy, and manifest security in 2-5 minutes locally. This enables pre-commit security gates without centralized infrastructure.
Tier 2: k3s, Lightweight Kubernetes
Setup: 10 minutes
Cost: $0-50 per month on a VPS
Coverage: 75-85%, with more realistic networking than kind
Use Case: Integration testing and ingress validation
What works better than kind:
- Better ingress controller support
- More realistic networking rather than Docker bridge networking
- Persistent storage testing
Limitations: Still vanilla Kubernetes, with no platform-specific features and no kernel-level security
Tier 3: Hardened OS VM, CoreOS or Flatcar with CRI-O and SELinux
Setup: 30-60 minutes, automated via Ignition
Cost: $55-140 per month
Coverage: 85-90%, including kernel-level security and production runtime behavior
Use Case: Container runtime security and kernel exploit testing
What works versus Tier 1 and Tier 2:
- Real kernel access with container escape testing at roughly 85% fidelity
- SELinux enforcement at roughly 90% fidelity with native CRI-O integration
- seccomp profile validation with roughly 44 blocked syscalls testable
- CRI-O runtime matching OpenShift rather than containerd
- Falco and eBPF monitoring with roughly 90-95% syscall visibility
- Container escape CVE testing for recent vulnerabilities
Limitations:
- No platform-specific security features such as SCCs
- No platform-specific operators and APIs
- No platform-specific hardening such as FIPS or custom crypto policies
The middle-tier value: For roughly $55 per month in reserved instances, teams get kernel-level security testing without full platform costs. This is the cost-optimized sweet spot for components with elevated privileges or code execution.
Tier 4: Upstream Platform, such as OKD for OpenShift workloads
Setup: 1-2 hours
Cost: $100-300 per month
Coverage: 90-95%, with platform APIs but without enterprise licensing
Use Case: Platform-specific features and operator lifecycle testing
What works versus Tier 3:
- Platform-specific APIs such as SCCs, Routes, and OAuth
- Platform operator ecosystem
- Built-in platform integrations
Limitations: Not enterprise certified and community support only
Tier 5: Production-Like Platform, Enterprise Kubernetes or OpenShift
Setup: 2-4 hours
Cost: $500-2,000 per month
Coverage: 95-99%
Use Case: Certification, pre-release validation, and quarterly penetration testing
What works versus Tier 4:
- Enterprise support and certification
- Production-identical configuration
- Full platform compatibility
Limitations: Still not production scale and not production data
Tier 6: Production
Setup: Weeks
Cost: $2,000-10,000 or more per month
Coverage: 100%
Use Case: Compliance certification, such as SOC 2 or FedRAMP, and final validation
Upstream vs. Downstream: The Critical Distinction
The fidelity spectrum reveals a deeper architectural insight: testing upstream project security is different from testing downstream platform integration.
The Dependency Chain
Many enterprise platforms are downstream of upstream open-source projects:
Upstream Project (e.g., Kubeflow)
↓
Platform Mid-Stream (e.g., Open Data Hub)
↓
Platform Downstream (e.g., RHOAI, enterprise products)
↓
Commercial product plus operators
Critical finding: Upstream projects are typically designed for Kubernetes portability. Platform-specific dependencies are downstream additions through operators, custom resources, and enterprise integrations.
Two Testing Tracks
This architectural layering enables dual-track testing:
Track 1: Upstream Component Security, Tier 1-3
- Test upstream project security using community manifests
- Kubernetes-portable by design, which makes Tier 1 kind viable
- Coverage: 70-85% of component-level security
- Velocity: Minutes to hours
- Cost: $0-140 per month
Examples include RBAC policies, NetworkPolicy enforcement, CVE scanning, secrets handling, API authorization, and admission control.
Track 2: Downstream Platform Integration, Tier 4-6
- Test platform operator deployment and platform-specific features
- Requires full platform infrastructure
- Coverage: 95-100% platform integration assurance
- Velocity: Quarterly validation cycles
- Cost: $100-2,000 per month
Examples include platform-specific security constraints, operator lifecycle vulnerabilities, enterprise authentication flows, and platform-specific hardening.
Strategic Value
Dual-track testing unlocks velocity without sacrificing coverage:
- Upstream testing: Continuous validation in minutes with Tier 1
- Downstream testing: Quarterly certification in full platform with Tier 5
- Combined: 100% comprehensive coverage with a 99% time reduction for most tests
The Decision Framework: Which Tier for Which Test?
Decision Tree
START
│
├─ Testing platform operator deployment?
│ └─ YES → Tier 4-5 (Upstream or Enterprise Platform)
│
├─ Testing platform-specific APIs (SCCs, Routes, OAuth)?
│ └─ YES → Tier 4-5 (Platform Required)
│
├─ Testing upstream project security (RBAC, CVEs, NetworkPolicy)?
│ └─ YES → Tier 1 (kind) VELOCITY UNLOCK
│
├─ Kernel security testing (SELinux, seccomp, container escapes)?
│ └─ YES → Tier 3 (Hardened OS VM) COST-OPTIMIZED
│
├─ RBAC, NetworkPolicy, static manifest analysis?
│ └─ YES → Tier 1 (kind)
│
└─ Individual component testing?
└─ YES → Tier 0 (static analysis)
Fidelity Matrix by Test Scenario
| Test Scenario | Tier 1 (kind) | Tier 3 (VM) | Tier 5 (Platform) |
|---|---|---|---|
| Upstream Component Testing | |||
| RBAC policy validation | ✅ 95% | ✅ 95% | ✅ 100% |
| NetworkPolicy enforcement | ✅ 90% | ✅ 90% | ✅ 95% |
| Container CVE scanning | ✅ 100% | ✅ 100% | ✅ 100% |
| Manifest security scanning | ✅ 100% | ✅ 100% | ✅ 100% |
| API authorization | ✅ 90% | ✅ 95% | ✅ 100% |
| Admission controller testing | ✅ 92% | ✅ 92% | ✅ 100% |
| Runtime/Kernel Testing | |||
| SELinux enforcement | ❌ 0% | ✅ 90% | ✅ 100% |
| seccomp profile validation | ❌ 20% | ✅ 90% | ✅ 100% |
| Container escape testing | ❌ 0% | ✅ 85% | ✅ 95% |
| Falco runtime monitoring | ❌ 10% | ✅ 90% | ✅ 95% |
| Platform-Specific Testing | |||
| Platform security constraints | ❌ 0% | ❌ 0% | ✅ 100% |
| Platform OAuth flow | ❌ 0% | ❌ 0% | ✅ 100% |
| Platform operators | ❌ 0% | ❌ 0% | ✅ 100% |
When to Escalate
Tier 1 is sufficient if:
- Testing upstream project security
- Standard Kubernetes RBAC and NetworkPolicy
- No kernel-level vulnerabilities are in scope
- The component is platform-agnostic
Tier 3 is required if:
- The component handles secrets or credentials
- The component executes user-provided code
- The component has elevated Kubernetes privileges
- Container runtime security is in scope
Tier 5 is required if:
- The test involves platform operator deployment
- Platform-specific APIs or features are required
- Enterprise certification is needed
- Quarterly penetration testing is required
Real-World Impact: Cost and Velocity Optimization
Before: Binary Approach
Approach: Use a production-like platform for all security testing
- 5 shared clusters × $1,000 per month = $5,000 per month
- Developer wait time: 24+ hours because of nightly builds
- Coverage: 95-99%
After: Graduated Fidelity Approach
Approach: Match the test to the appropriate tier
Tier 1, kind, continuous validation:
- 20 upstream components validated in kind
- Cost: $0
- Feedback: 2-5 minutes
- Coverage: 70-80% of component security
Tier 3, hardened OS VM, weekly validation:
- 15 components with kernel dependencies
- Cost: $825 per month
- Feedback: 30-60 minutes
- Coverage: 85-90% of runtime security
Tier 5, platform, quarterly validation:
- 11 platform-dependent components
- Cost: $1,000 per month
- Coverage: 95-100% platform integration
Total: $1,825 per month, a 63% cost reduction
Developer feedback: 2-5 minutes continuous, a 99% time reduction
Combined coverage: 100%, with Track 1 upstream plus Track 2 downstream
Actionable Recommendations
1. Audit Your Component Architecture
Question: Is your component upstream-portable or downstream-coupled?
Upstream-portable components:
- Use standard Kubernetes APIs and no platform-specific APIs
- Deploy using community manifests
- Examples: RBAC policies, NetworkPolicy, and standard workloads
This means Tier 1, kind, is viable for 70-80% of security testing.
Downstream-coupled components:
- Require platform operators
- Use platform-specific APIs
- Examples: platform operator lifecycle and custom security constraints
This means Tier 4-5 is required.
2. Implement Dual-Track Testing
Track 1: Continuous Upstream Testing, Tier 1
- Pre-commit: RBAC, manifest validation, and CVE scanning in 2-5 minutes
- CI and CD gates: block merges on critical findings
- Developer laptop: local security validation without waiting
Track 2: Periodic Downstream Testing, Tier 5
- Monthly: platform operator integration validation
- Quarterly: full penetration testing and compliance certification
- Annual: production-environment final validation
3. Optimize Costs with Component Mapping
Create a component-to-tier matrix:
| Component Type | Security Surface | Recommended Tier | Cost | Cadence |
|---|---|---|---|---|
| Standard workloads | RBAC, Network, CVE | Tier 1 (kind) | $0 | Continuous |
| Privileged operators | Kernel, SELinux, secrets | Tier 3 (VM) | $55-140/mo | Weekly |
| Platform operators | Platform APIs, lifecycle | Tier 5 (Platform) | $500-2K/mo | Quarterly |
4. Establish Escalation Criteria
Define when to escalate from Tier 1 to Tier 3 to Tier 5:
Escalate to Tier 3 if:
- The component fails container escape tests in Tier 1, where fidelity is effectively 0%
- SELinux violations are detected in logs
- seccomp profile validation is needed
- Runtime monitoring such as Falco is required
Escalate to Tier 5 if:
- Platform-specific features are needed
- Compliance certification is required
- A customer-reported security issue appears in platform deployment
- The quarterly penetration testing cycle arrives
5. Document Honest Limitations
For each tier, document what can and cannot be tested:
Tier 1, kind, honest scope:
- ✅ Can test: RBAC bypass attempts, NetworkPolicy circumvention, manifest misconfigurations
- ❌ Cannot test: container escapes, SELinux violations, platform OAuth flows
This transparency builds organizational trust in graduated testing.
Common Questions
Q: Why not just use production-like environments for everything?
A: Cost and velocity. Production-like environments cost $500-2,000 per month and take 2-4 hours to set up. For 70-90% of security tests, simulation provides equivalent coverage at 90% or more lower cost with 99% faster feedback.
The useful question is not “Is kind as good as production?” It is “Which security tests need production fidelity, and which do not?”
Q: What about false negatives, tests passing in kind but failing in production?
A: This is the right concern. The answer is graduated escalation:
- Tier 1 catches 70-80% of issues, such as RBAC, manifests, and CVEs
- Tier 3 catches 85-90% of issues, adding kernel and SELinux coverage
- Tier 5 catches 95-99% of issues, adding platform specifics
False negatives occur at tier boundaries. Document what each tier cannot test, then use higher tiers for validation. The 70% caught in Tier 1 never consume expensive Tier 5 resources.
Q: How do I know my component’s portability?
Ask:
- Does it use platform-specific APIs? If yes, it is downstream-coupled.
- Does it deploy via platform operators? If yes, it is downstream-coupled.
- Does it use only standard Kubernetes APIs? If yes, it is upstream-portable.
Validate by attempting deployment in kind using community manifests. If it deploys, it is upstream-testable.
Q: What about the nightly build bottleneck?
This is the primary value proposition. Teams waiting 24 hours or more for centralized DevOps nightly builds can validate security locally in 2-5 minutes with Tier 1 kind. This unlocks:
- Pre-commit security gates that block insecure code before merge
- Developer autonomy without waiting for centralized resources
- Faster iteration where fix, test, and fix again happens in minutes rather than days
Conclusion: From Binary to Spectrum
Security testing does not require a binary choice between velocity and confidence. By understanding the graduated fidelity spectrum and the upstream vs. downstream architectural distinction, teams can optimize for both:
70-80% of security testing → Tier 1, kind: $0, minutes, continuous
85-90% of security testing → Tier 3, hardened VM: $55-140 per month, weekly
95-100% of security testing → Tier 5, platform: $500-2,000 per month, quarterly
Combined: 100% coverage with a 63% cost reduction and a 99% time reduction for most tests.
The future of cloud-native security testing is not choosing between fast or comprehensive. It is knowing which tier to use for which test, and architecting components to maximize upstream portability.
The velocity unlock is real. The cost savings are substantial. The limitations are clear. The framework is actionable.
About This Research
This framework emerged from analyzing security testing patterns for AI and ML platforms on Kubernetes, synthesizing findings from architectural analysis of 46 components, prototype validation, and market research across 15 or more vendors. The upstream vs. downstream distinction was validated through community documentation review and consultant practitioner insights.
Acknowledgments: This research benefited from contributions across technical architecture, OSINT research, market intelligence, and prototype engineering disciplines. Special recognition goes to external consultant perspectives on developer workflow pain points and architectural layering.