Proving the Whole Facility

by J.J. Malloy | Oct 27, 2025

Levels 0–4 are about aligning design, verifying components, and proving systems work. But none of that matters until the entire facility is tested as one. Level 5: Integrated Systems Testing (IST) and Reliability Run is where commissioning shows that a data center can actually survive the failures it was designed to handle.

This is the moment owners, operators, and stakeholders care about most: Will it hold under stress?

Why Level 5 Matters

A data center is only as reliable as its weakest interface. Level 5 exposes those weaknesses by simulating real-world scenarios:

  • Utility outage with generators carrying load.
  • UPS failures and transfers.
  • CRAH/CRAC failures under full IT load.
  • Multiple cascading faults happening at once.

IST proves resilience, not just functionality. If the facility doesn’t ride through these tests, it’s not ready to host critical workloads.

What Happens in Level 5

This level validates end-to-end performance of the facility.

1. Integrated Systems Testing (IST)

  • Black Start: Confirm the site can be brought online from a dead utility state.
  • Load Transfers: Simulate utility failure, generator startup, ATS transfer, UPS ride-through.
  • Failure Scenarios: Trip breakers, drop CRAHs, simulate pump or chiller loss. Validate redundancy holds.
  • Simultaneous Faults: Test multiple failures at once to expose cascading risks.

2. Reliability Run (72–96 Hours)

  • Operate the site at sustained load with continuous monitoring.
  • Capture temperature stability, UPS battery performance, chiller cycling, generator runtime.
  • Expose hidden issues: oil leaks, vibration, overheating, drifting setpoints.

3. Operator Involvement

  • Facility staff run the tests under commissioning supervision.
  • Validate MOPs/SOPs/EOPs (Method, Standard, and Emergency Operating Procedures).
  • Train operators through real scenarios before go-live.

Exit Criteria

Level 5 is only complete when:

  • All fault and failover scenarios are tested and documented.
  • A signed IST report exists with deficiencies tracked.
  • Reliability run is completed with trending data and performance reports.
  • Operations team is trained, confident, and signed off on procedures.

Common Pitfalls

  • Shallow testing. Some projects stop at single-fault scenarios, never testing cascading failures.
  • Skipped reliability run. Without a multi-day run, latent problems (thermal drift, lubrication failures, fuel system issues) stay hidden.
  • No operator buy-in. If staff don’t run the tests, they won’t be ready when a real event occurs.
  • Weak documentation. IST without a complete, signed package leaves future disputes unresolved.

The Bigger Picture

Level 5 is the commissioning milestone that proves the facility is ready for production. Done right, it gives owners confidence, equips operators with real training, and validates that redundancy, performance, and resilience are more than marketing claims.

Done poorly, or rushed, it leaves a facility unprepared for its first real failure event, which is always more expensive than a simulated one.

Commissioning is about reducing risk. Level 5 is where risk is confronted head-on.

Closing Thought

You don’t rise to the occasion; you fall to the level of your training. – Archilochus

Level 5 is the training ground. It’s where people, systems, and procedures are tested together, proving the entire facility can handle reality before it carries critical load.