

An externally-managed CRE firm ran its portfolio on a stack of disconnected systems: MRI for accounting, ARGUS for valuation, VTS for leasing, a CMMS for maintenance, with contracts and documents sitting in the managers' SharePoint and no unified view above any of it.
Data was scattered and, in places, had no real workflow around it. Data quality was extremely poor — the "ARGUS bug," later traced to unvalidated inputs, was one symptom of many. Internal data couldn't be stitched together with external market data, and there was no way to take public signals like news, filings and market moves and see how they would hit the portfolio.
So every leasing, capex and procurement decision was made in its own silo, on partial information, none tied to the year-end valuation that drives NOI.
A decision platform for a CRE firm's operating team, built in three layers: the data, the AI-enabled workflows on top of it, and the evals-and-governance layer underneath that keeps it honest.
Data lakehouse and semantic model: a unified lakehouse with automated extraction across leases, contracts and work orders, the firm's outside managers brought under data-quality agreements, and input validation — for instance, checking ARGUS forecast assumptions against the actual lease documents and flagging anything uncertain for a human.
AI-enabled workflows with a human in the loop: when a decision comes up (a lease, renovation, vendor contract, tenant feedback, or outside news), agents baseline its impact against parameters like NOI. Purpose-built predictive models forecast; the language model reads documents and explains the reasoning; each is used only where it's reliable. Every recommendation returns a confidence score, a reasoning trace and a predicted year-end valuation, and a qualified human signs off before it lands.
Evals and governance: each recommendation traces from output back to the exact inputs that produced it and benchmarks against custom evals; every model has a named owner and is back-tested on a rolling window; new predictions run in shadow mode until they earn trust. Red-teaming ensured a secure rollout, and an AI governance layer keeps it compliant across jurisdictions.
TrustEvals built the platform in three layers. The foundation was a unified Databricks lakehouse: automated extraction across leases, contracts and work orders, with the firm's outside managers brought under data-quality agreements and inputs validated — for instance, checking ARGUS forecast assumptions against the actual lease documents and flagging anything uncertain for a human. On top sat human-in-the-loop AI workflows: when a lease, renovation, vendor contract or market event comes up, agents baseline its impact against NOI, weighing tenant retention, comparable spend, real build costs and the valuation hit. Purpose-built predictive models do the forecasting; the language model reads documents and explains the reasoning; each is used only where it is reliable. Every recommendation returns a confidence score, a reasoning trace and a predicted year-end valuation for a qualified human to sign off. Underneath, an evals-and-governance layer traces each output back to its inputs, benchmarks against custom evals, back-tests every model on a rolling window, and runs new predictions in shadow mode until they earn trust. Red-teaming preceded rollout. Delivered over 4–6 months.
CRE firms, REITs, multifamily and real-estate operators running real portfolios across multiple outsourced managers and legacy systems (ARGUS, MRI, VTS and the like).
The pattern fits any operator who wants AI-enabled workflows tied directly to the numbers they're measured on — NOI, valuation, leverage. The conditions that make it work: deep buy-in from internal and vendor stakeholders, a clear responsibility charter, asset-level data spread across several systems, clear financial objectives to optimize toward, and a team that wants every recommendation defensible at the board or investment-committee level.






