

A PE-backed banking-software platform had been rolled up from seven companies into one business serving thousands of US financial institutions. Leadership had done the visible integration — one set of policies, one security stack.
But AI was running across three clouds, several engineering teams, and contractors who each held their own access, and no one could say what was live, who was using what, or where customer data was going.
On the engineering side, developers were shipping AI-generated code (Claude Code, Cursor) with no consistent review or security path. Non-engineering teams were using unauthorized MCPs and a sprawl of part-solving vendors on plans misaligned with actual usage. And the AI capabilities already shipping inside the product were weakly governed with limited eval harnesses.
An AI audit across a PE-backed banking roll-up, anchoring four workstreams: the audit itself, securing how engineering builds with AI, reviewing the AI already in the product, and the governance framework to hold it together. It all runs on the TrustEvals evals platform (Python SDK, ingest gateway, eval engine, dashboard) deployed on the client's AWS instance.
AI Audit: finding every place AI was running across the seven merged companies — sanctioned tools, shadow tools on personal accounts, and MCPs — producing one unified, audit-committee-grade inventory.
Securing how engineering builds with AI: an AI code-review framework, CI/CD pipeline security, sandboxing and isolation, automated flagging plus human review, developer security training, and a vetted tool list.
Reviewing the AI in the product: architecture and multi-cloud data-flow review, prompt-security and output-validation tests, a model-selection and monitoring audit, data-privacy checks, and an AI incident-response playbook — with real evals run against the live pipeline.
The governance framework: an ISO 42001-aligned policy suite, an AI evaluation framework with use-case-specific thresholds, model-drift monitoring, a risk-assessment methodology, a consolidation plan onto a sanctioned toolset, and board-level reporting.
TrustEvals deployed its eval platform inside the client's AWS environment and worked four compounding workstreams. Discovery came first: external sensors (SentinelOne, Snyk, Cisco MCP Scanner, OAuth exports, git-history and attestation surveys) fed a classification engine that found every place AI was running across the seven merged companies — sanctioned tools, shadow tools on personal accounts, and MCPs — producing one audit-committee-grade inventory. Next, they hardened how engineering shipped AI: an AI code-review framework, CI/CD pipeline security, sandboxing, automated flagging with human review, and a vetted tool list. Third, they reviewed the AI already in the product, running prompt-security, output-validation and drift evals against the live AWS Bedrock pipeline before bank customers relied on it. Finally, they stood up the governance framework — an ISO 42001-aligned policy suite, use-case-specific eval thresholds, model-drift monitoring, a risk methodology, and board-level reporting. The team scoped it explicitly as a point-in-time audit feeding a continuous governance program. Delivered in 4–8 weeks.
Any multi-entity group — especially a PE-backed roll-up where AI has spread across merged companies, multiple clouds, contractors and a sprawl of part-solving vendors faster than a single view could keep up.
It also fits software companies whose own product ships AI to regulated customers and has to prove that AI is governed and evaluated. The conditions that make it work: several entities under one roof, more than one identity or cloud environment, real spend leaking into overlapping tools, and leadership that wants an audit-grade read of both internal AI use and the AI in the product.






