Modes We Catch Early

From Level 0

by J.J. Malloy | Nov 17, 2025

In commissioning, the difference between a smooth handover and a costly delay often comes down to one thing, what risks you caught early.

The best commissioning teams don’t wait for problems to show up in Level 5 testing. They find them at Level 0, when they’re still lines on a drawing, not failures in the field. That’s where the risk register becomes more than a document, it becomes a roadmap for proactive problem-solving.

Here are five failure modes we consistently catch before they ever make it to site, and how.

1. Incomplete or Unvalidated Sequences of Operation (SOO)

The failure mode: Control sequences that look perfect on paper but fail under live conditions, chillers short-cycling, generators not syncing, CRAHs fighting each other.

How we catch it: At Level 0, every sequence gets walked through in a “tabletop test.” Engineers, vendors, and commissioning leads talk through every input, output, and interlock. Then we simulate those conditions digitally or during FAT (Level 1).

Why it matters: Testing logic early prevents reactive field fixes that burn time, budget, and trust.

2. Misaligned Vendor Interfaces

The failure mode: Systems that don’t communicate because the interface points weren’t defined clearly, or were defined twice, differently, by two different contractors.

How we catch it: Before the first I/O point is wired, we map every signal, who sends, who receives, and what “normal” looks like. A single sheet, owned by commissioning, prevents dozens of assumptions later.

Why it matters: You can’t integrate what you don’t define. Interface alignment during design saves hundreds of man-hours at startup.

3. Inconsistent Test Boundaries

The failure mode: One vendor tests only their side of the fence, the other assumes integration isn’t their scope. Nobody catches the gap until the first full system test fails.

How we catch it: During Level 1 and Level 2 planning, we define clear test boundaries and acceptance ownership. Every test script ends with, “who closes the loop?”

Why it matters: Accountability needs a home. Early boundary definition turns multi-vendor risk into shared confidence.

4. Control Logic Drift During Construction

The failure mode: A PLC or BMS sequence changes midstream due to field conditions or a firmware update, but the documentation doesn’t. The logic tested isn’t the logic installed.

How we catch it: We run version control on all logic and settings, every FAT report, SAT update, and field firmware version is tracked against the commissioning database.

Why it matters: If the logic doesn’t match the SOO, your entire performance test is a false read.

5. Overlooked Failure Scenarios

The failure mode: Only testing “happy paths”, what happens when everything works, and ignoring “what if” failures like a CRAH stuck valve or a UPS bypass relay failure.

How we catch it: At Level 0–3, we run structured Failure Mode and Effects Analysis (FMEA) sessions. We simulate multiple failures, not just single ones, and adjust sequences to mitigate.

Why it matters: Real resilience isn’t proven by uptime, it’s proven by what happens when something breaks.