MicroLink × NVIDIA · Working Session Reference
← Back to Eight Zones
Prepared 4 May 2026 / Version 0.2 Draft / Confidential
Zone 03 · Build · Cluster A

Monitoring andcontrol

For the NVIDIA platform team · Pazos · Fassiotti

In a conventional data centre, the IT stack and the BMS stack are two systems that share a building. In a closed-loop deployment, they share physics. NVIDIA's stack is the only one that runs across all four layers as one. We deploy it as the unifying control surface.

Owner
NVIDIA platform team
Adjacent: Alex Pazos, Claudio Fassiotti
MicroLink lead
Shane Pather
CTO · Engineering
Stack scope
Four layers · one surface
GPU · cluster · building · twin
Strategic centrepiece
DSX Blueprint
Closed-loop twin extension
Working session
5 May 2026
30 min · Teams
03 The Thesis
IT and BMS are usually two stacks that share a building. In our deployment they share physics. The closed thermodynamic loop only works if monitoring and control are coupled in software. DCGM at the GPU. Mission Control at the cluster. Metropolis at the building. Omniverse and DSX Blueprint at the twin. One stack, four layers, one surface.
01
Layers in the stack
4
GPU · cluster · building · twin. Each anchored to a published NVIDIA product, observed and controlled from one surface.
Architectural Reference
02
IT-BMS coupling
Real time
Sub-second at the GPU layer. Tens of seconds at the building. Minutes at the twin. Coupled cadences, one surface.
Per layer Honest cadence
03
Twin scope
DSX
Blueprint
Extended to model thermal and biogas loops. Not in the published reference today. The novel thing.
Co-developed Fassiotti
04
Audit metrics
By design
ERE, WUE, heat utilisation factor, attestation: emitted by the same stack that runs operations.
SB 253 / 261 EU EED 2027

Four layersone stack#

GPU at the bottom, twin at the top, building and cluster between. Each layer is a published NVIDIA product. The novelty is running them as one coupled control surface.

DCGM is table stakes. Metropolis applied to a data centre as a smart building is more interesting. DSX Blueprint extended to the thermal coupling layer is genuinely new.

The four-layer stack is built from existing NVIDIA platform products. At the GPU layer, DCGM and NVML provide telemetry for power, temperature, utilisation, and per-tenant attribution. At the cluster layer, Mission Control handles fleet orchestration, GPU lifecycle, workload scheduling, and observability across pods. At the building layer, Metropolis treats the facility as a managed asset, fusing sensor data from BMS, leak detection, ingress monitoring, and security cameras. At the twin layer, Omniverse with DSX Blueprint models the data centre's physical state as a real-time simulation.

The bridge between the IT and facility worlds runs on Jetson and IGX Orin at the cabinet edge. These read facility sensors, push BMS state up into Metropolis, and execute control loops back down into chillers, valves, and CDU setpoints. Every cabinet is a Jetson node. The fleet is observable and controllable from one place.

What makes this architectural rather than incremental: the twin layer simulates physical processes that the published DSX Blueprint reference does not currently cover. Heat flow into a host digester loop. Biogas yield response to thermal demand. PHE saturation under workload shifts. Dry-cooler engagement under host-loop drop-off. Building this extension is the white-paper opportunity.

Figure 01 · The four-layer monitoring and control stack
GPU, cluster, building, twin.One operator surface across all four.
Telemetry flows up. Control flows down. Jetson at the edge bridges IT and BMS. Each layer anchored to a published NVIDIA product.
Confidence · architectural
ABSTRACTION PHYSICAL SYMBOLIC TELEMETRY UP CONTROL DOWN LAYER 04 · TWIN Omniverse + DSX Blueprint Real-time simulation · physical processes · workload-thermal coupling · forecast THERMAL LOOPS BIOGAS YIELD PHE STATE FORECAST · 24h EXTENSION · CO-DEVELOPED LAYER 03 · BUILDING Metropolis Facility-as-managed-asset · sensor fusion · BMS · ingress · environment BMS SENSOR LEAK DETECT VIDEO INGRESS ENV TELEMETRY SMART-BUILDING ROLE LAYER 02 · CLUSTER Mission Control Fleet orchestration · GPU lifecycle · workload scheduling · cluster observability SCHEDULER LIFECYCLE k0rdent FLEET-LEVEL OBS PER-TENANT ATTRIBUTION LAYER 01 · GPU DCGM + NVML Per-GPU telemetry · power · temperature · utilisation · attribution PWR · TEMP UTILISATION PER-GPU LIFECYCLE TABLE STAKES · STANDARD EDGE BRIDGE · CABINET-RESIDENT Jetson + IGX Orin reads sensors → pushes BMS up → executes control loops down
Source · NVIDIA platform reference architectures · MicroLink monitoring stack v0.4 Method · Four-layer abstraction · NVIDIA products mapped to MicroLink ops surface

The coupled physicsmade observable#

In our deployment the IT load and the host process are physically coupled by the closed thermodynamic loop. Two parallel monitoring stacks cannot represent that coupling. The software has to know what the physics knows.

Every physical node corresponds to a software node. Every software node corresponds to a physical reality. The coupling is the architecture.

In a conventional data centre, IT and BMS are weakly coupled. CRAC setpoints respond to room temperature, which responds to GPU power. The feedback loop is closed by the building, not by software, and the control system does not need to know what the GPUs are doing — just how hot the room is. That model breaks the moment heat becomes an output product, not just waste. When the digester wants 6 MW of thermal at 45 °C continuously, the IT layer is not free to ignore the building layer.

The coupling has to be made observable in software. DCGM tells Mission Control about workload patterns. Mission Control tells Metropolis about expected thermal load over the next ten seconds. Metropolis reads PHE state, digester demand, and dry-cooler engagement. Omniverse runs ahead of all of it as a forecast: if this scheduling decision is taken at the cluster layer, what does the building layer look like in five minutes? That is the kind of question the coupled stack answers.

Figure 02 · Coupled physics · physical and software flow
Two flows.One coupling. Vertical correspondence.
Physical flow on top: heat moving through the system. Software flow on bottom: telemetry observing it. Dotted lines mark where the layers correspond.
Confidence · architectural
PHYSICAL · WHAT MOVES THROUGH THE SYSTEM GPU SERVER HEAT 35–65 °C 1 CDU MIX · CONTROL N redundant 2 PHE CROSS-OVER 45 °C → 37 °C 3 HOST PRIMARY DELIVERY ~6 MW thermal 4 DIGEST 37 °C CONSUMPTION biogas yield 5 BIOGAS → MCFC FUEL FEEDBACK freed from cogen 6 CORRESPONDS SOFTWARE · WHAT OBSERVES AND CONTROLS DCGM + NVML ~1 s · GPU layer Mission Control + k0rdent ~10 s · cluster Jetson IGX Orin EDGE BRIDGE edge · cabinet Metropolis BUILDING ~1 m · facility DSX Blueprint EXTENDED ~5 m · twin Omniverse FORECAST 24 h horizon CORRESPONDENCE NOTES silicon scheduling input cabinet sensor building sensor demand model yield forecast
Source · MicroLink monitoring stack v0.4 · WWTP Thermal Memo §03 Method · Physical-software correspondence diagram · 1:1 node mapping
§
The coupling is the architecture
In a conventional deployment, the physical and software stacks operate independently and the building closes the loop. In a closed-loop deployment, the software has to close the loop because the building cannot. Every physical node has a software node that observes or controls it. That correspondence is what the four-layer stack delivers.

The digital twinextended for closed-loop ops#

Omniverse and DSX Blueprint already model the IT envelope. We extend the twin to model the thermal and biogas loops as well. This is the architecturally novel work, and the white-paper opportunity.

The twin lets the operator ask: if I take this scheduling decision now, what does the host loop look like in five minutes? What does biogas yield look like in five hours?

The DSX Blueprint reference architecture today covers compute, networking, power, and IT-side cooling. It does not cover the host-coupled thermal loops, the biogas feedback path, or the integrated source mix (MCFC plus LFP plus PEM plus grid) that this deployment runs. The extension we propose adds those elements so the twin reflects the full physical reality of a closed-loop deployment.

What the extended twin lets the operator do: simulate workload pattern shifts and see PHE saturation in advance. Predict digester thermal demand against current host operating state. Forecast biogas yield in response to thermal delivery. Plan dry-cooler engagement under predicted host-loop drop-off. Schedule tenant jobs against predicted thermal availability and gas displacement value. Each of these is a real operations question that the published reference twin cannot currently answer.

Figure 03 · Digital twin scope · what's in the model
Inside the twin.Outside the twin: what the operator does with it.
The boundary box is the load-bearing element. Elements inside are co-modelled in real time. Elements outside are operator workflows that consume the twin.
Confidence · architectural · subject to engineering review
DSX BLUEPRINT TWIN · OMNIVERSE Real-time co-simulation IT ENVELOPE Pod compute 576 GPU · 11.2 MW PUBLISHED REFERENCE THERMAL TRANSFER CDU + PHE state 45 °C secondary EXTENSION HOST PROCESS Primary loop ~6 MW thermal demand EXTENSION DIGESTER MODEL Sludge thermal 15 → 37 °C lift EXTENSION BIOGAS FEEDBACK Yield → MCFC closed-loop physics EXTENSION · NOVEL SOURCE MIX MCFC · LFP · PEM · grid 2N power topology EXTENSION REJECTION FALLBACK Dry cooler engagement simulated load envelope FORECAST · 24 h Workload simulation scheduling decisions PUBLISHED REFERENCE EXTENSION · CO-DEVELOPED OPERATOR INPUT Workload schedule tenant job placement Host state digester demand profile Live telemetry sensors, BMS, DCGM TWIN OUTPUTS PHE saturation predicted ahead Biogas yield forecast · 24 h Schedule signal to Mission Control QUESTIONS THE TWIN ANSWERS · NOT IN PUBLISHED REFERENCE → If I take this scheduling decision now, what does host loop temperature look like in five minutes? → What is the predicted biogas yield from the digester over the next 24 hours given current thermal delivery? → When will PHE saturate under current workload pattern, and what scheduling shift defers it? → Should the dry cooler engage now, in anticipation of host-loop drop-off?
Source · NVIDIA DSX Blueprint reference · MicroLink twin extension v0.4 Method · Boundary diagram · published vs co-developed scope · operator workflow

Audit metricsby construction#

The disclosure regime for state, federal, and international public-sector deployments asks for specific metrics. The four-layer stack emits them as operational telemetry, not as bolt-on reporting.

Public-sector deployments under California SB 253, SB 261, and the EU Energy Efficiency Directive 2027 are required to disclose operational and environmental metrics that the conventional data centre playbook treats as a separate audit exercise. The four-layer stack generates these metrics as a side effect of operations: ERE alongside PUE drops out of the GPU and building layers in real time. WUE drops out of the building layer's loop telemetry. Heat utilisation factor drops out of the PHE state in the twin. Per-tenant attestation drops out of BlueField-3 and DCGM. The audit footing is constructive.

This matters strategically because every additional public-sector deployment of this pattern faces the same disclosure framework. Once we publish the reference for how the four-layer stack maps to the disclosure regime, that mapping becomes part of the deployment template. Pazos's portfolio benefits from a published audit-by-construction approach across the public-sector accounts NVIDIA touches.

Source layer · GPU + Building
ERE · PUE · WUE
Real-time telemetry from DCGM and Metropolis. Published as standard operational metrics.
Source layer · Twin
Heat factor · grid position
DSX Blueprint extension emits heat utilisation factor and net grid energy position continuously.
Source layer · Cluster + GPU
Attestation · CO₂
Per-tenant attestation from BlueField-3 plus per-GPU CO₂ attribution from DCGM, both published as continuous metrics.
§
The audit footing is the operational footing
There is no separate audit data pipeline. There is no quarterly assembly exercise. The metrics required by SB 253, SB 261, and EU EED 2027 are emitted by the same stack that runs the deployment. That alignment is rare in this industry, and it is the right published reference for any public-sector deployment that follows.

The askand what we bring#

Co-develop the four-layer stack as the reference for closed-loop and public-sector deployments. The platform team owns the products. We bring the deployment that exercises all four layers.

Tier · Monitoring and control reference architecture
One stack, four layersThe DSX Blueprint extension for closed-loop deployments

DCGM at the GPU. Mission Control at the cluster. Metropolis at the building. Omniverse and DSX Blueprint at the twin. We deploy them as one surface and contribute the extension that closes the loop.

From the platform team
  • Four-layer stack architecture review
  • Metropolis deployment guidance for facility-as-asset
  • Mission Control + k0rdent integration spec
  • Jetson + IGX Orin reference for facility-side BMS
  • Cadence with the platform team through Q4 2026
From Fassiotti specifically
  • DSX Blueprint extension scope
  • Co-authored closed-loop twin reference
  • Omniverse simulation for thermal coupling
  • Joint white paper opportunity
From Pazos specifically
  • Audit-by-construction template review
  • Public-sector disclosure mapping
  • SB 253 / SB 261 / EU EED 2027 alignment
  • Bridge to public-sector deployments that follow