Mitigating the AI Agent Audit Gap Under SOC-2

Most enterprise guidance on AI agent governance is written for one of two audiences. Either it assumes the strictest possible regulatory regime (healthcare, defense, banking) and treats every control surface as load-bearing, or it assumes no compliance constraints at all and reads as a pure productivity pitch. The middle is largely unaddressed.

That middle is where the bulk of the mid-market actually lives. Real estate, manufacturing, distribution, professional services, regional finance, technology services, hospitality. These firms typically have SOC-2 obligations attached to their financial reporting and customer data handling, but no HIPAA, no SOX, no PCI exposure beyond what their payment processor absorbs. They are audited, but lightly, and the audit perimeter is bounded within a subset of their organization.

For these companies, the AI agent governance question is genuinely interesting, because the right answer is not “deploy nothing” and it is not “deploy everywhere.” It is a process of classification, perimeter mapping, and architectural choice that, done well, captures most of the productivity upside while preserving compliance posture. Done poorly, it either freezes the organization out of a meaningful productivity shift or quietly creates audit findings that surface a year later.

This piece lays out the framework we use when advising lightly-regulated clients on AI agent deployment, with specific attention to SOC-2 considerations and the decision points around how to classify processes and grant system access.

Why the governance conversation has changed

The arrival of agentic AI tools (Claude Cowork’s recent launch making it the current category leader, with Microsoft and OpenAI shipping comparable capabilities) shifts the governance question in a way that most published frameworks have not caught up to.

The previous generation of generative AI governance was mostly about input control. Could users paste sensitive data into a chatbot? Did the model retain prompts? Was data leaving the corporate perimeter? These are real questions, but they are answered through familiar mechanisms: data loss prevention rules, approved-tool lists, employee training, contract terms with the vendor. There were (and still are) questions about copyright sensitivity in output as well.

Agentic AI introduces a different question: not what data the user gives the model, but what actions the model takes on its own. An agent that reads files, writes files, sends messages, navigates browsers, schedules tasks, and triggers connectors to enterprise systems is, functionally, a user. It needs an identity, it accumulates a behavioral footprint, and it can do harm at scale and at speed.

The governance vocabulary that maps to this is closer to Identity and Access Management (IAM), Robotic Process Automation (RPA), and segregation of duties than it is to traditional AI policy. And that vocabulary is exactly where SOC-2 has its strongest opinions.

The advantage of being lightly-regulated

Companies under SOC-2 alone, without HIPAA or SOX layered on top, occupy a position that is more favorable than most leadership teams realize.

SOC-2’s Common Criteria address logical access, change management, monitoring, and availability across the systems in scope. Crucially, “in scope” is something you define in coordination with your auditor based on which systems and processes touch the trust services criteria you have committed to. For most lightly-regulated firms, the in-scope perimeter is the financial reporting stack, the customer data systems that produce trust-relevant outputs, and the IT general controls supporting both. Everything else, by definition, is out of scope.

That out-of-scope surface is where the productivity dividend lives. Lease abstraction, market research synthesis, marketing copy, internal communications, presentation prep, calendar coordination, vendor management drafts, RFP responses, prospect research, knowledge base curation. None of this is SOC-2-relevant for most mid-market firms. All of it is highly automatable with current agent technology.

The mistake we see most often is binary thinking: a firm decides “we have SOC-2 obligations, therefore we cannot deploy AI agents,” when the correct framing is “we have SOC-2 obligations, therefore we cannot deploy AI agents inside the SOC-2 perimeter without compensating controls; outside that perimeter we have substantial latitude.”

The first job of governance, then, is drawing the perimeter explicitly.

Mapping your compliance perimeter

Most mid-market companies have never explicitly mapped their SOC-2 perimeter at the level of granularity required to make agent deployment decisions. Their auditor has a working understanding, IT has a different working understanding, and the lines blur whenever a system serves multiple purposes.

Three questions establish the perimeter for AI agent purposes:

Does this system create or modify data that feeds financial reporting? Anything answering yes is in the perimeter. The general ledger, the property management system in real estate, the ERP in manufacturing, the practice management system in professional services. The actions taken in these systems become the audit evidence.

Does this system produce evidence that auditors directly review? Document repositories holding signed agreements, vendor contracts, customer contracts, board materials, audit workpapers. These are inside the perimeter even if they do not look like financial systems, because their integrity is part of what is being attested.

Does this system modify access, configurations, or controls within other in-scope systems? Identity providers, privileged access management tools, change management systems. These are inside the perimeter as IT general controls.

Anything that does not answer yes to any of these three is outside the perimeter. Most of an organization’s daily work happens outside the perimeter. That is where agents go first.

The harder cases are the adjacent systems: feeders, integrators, and dual-purpose tools.

The adjacent-system problem

Customer relationship management is the canonical example. A standalone CRM used to track prospects and broker relationships is not a financial system. The same CRM, configured to feed deal commissions, pipeline-weighted revenue forecasts, or commitment data that reaches investor reporting, becomes a feeder into the financial perimeter.

The perimeter follows the data flow, not the system label.

The right approach is to map data flows, not system inventories. For each system the organization wants to grant agent access to, trace the downstream consumers. If any downstream consumer is in-perimeter, the writes from this system are in-perimeter even if the system itself is not.

This is unglamorous work and most organizations have not done it. It is also the single most valuable input into agent deployment decisions, because it converts an ambiguous “should we let agents touch the CRM?” into a precise “agents can read freely; agents can write to fields A, B, and C, but not to D, E, and F because those feed the revenue forecast that lands in board reports.”

Classification of processes

Once the perimeter is mapped, processes fall into three categories.

Category 1: Out-of-perimeter, agent-eligible. The default posture for these is permissive. Lease abstraction, marketing content generation, document summarization, calendar coordination, email triage, research synthesis, internal communications, vendor coordination drafts, RFP and proposal responses, knowledge base curation, meeting prep, travel planning. The governance question is not “should we automate this?” but “how do we deploy effectively and broadly?”

Category 2: Adjacent, conditionally agent-eligible. These require process-level judgment. CRM updates that do not touch financial-feeding fields. Document repository organization where some documents are audit evidence and others are not. Email composition where some recipients are external counterparties and some are internal teammates. The right answer is to scope agent permissions narrowly: read-only by default, write access limited to specific fields or folders, no ability to send external email without human approval.

Category 3: In-perimeter, agent-prohibited (for now). General ledger postings. Tenant ledger adjustments in real estate. Billing system modifications. Distribution calculations. Audit workpaper modifications. Privileged access changes. Anything that creates or modifies data that auditors will sample. The “for now” qualifier matters; this category will shrink as platform audit capabilities mature, but as of the current state of agent platforms, these workflows do not have the audit trail or IAM granularity that SOC-2 evidence requires.

The deliverable that comes out of this exercise is a one-page systems classification document. For each major system, it specifies the category and the specific permissions agents may have. This document becomes the operational definition of the firm’s AI agent governance policy. It is also exactly the artifact an auditor will ask for if they ever inquire how AI agent risk is managed.

System access decisions, by category

A few system classes deserve specific treatment because they recur across industries and require nuanced positioning.

Financial and operational systems of record

Property management (Yardi, MRI, RealPage, AppFolio in CRE), enterprise resource planning (NetSuite, Sage, Microsoft Dynamics), practice management, billing systems. These are almost always in-perimeter and almost always Category 3.

The right posture is explicit prohibition through MCP allowlists and connector exclusion. Agents do not get connector access to these systems. If the organization wants AI assistance for workflows in these systems, the right vehicle is a separate API-based engagement with proper audit logging built in, not a desktop agent.

This is also a place where the governance document should leave the door open for future expansion. The phrasing matters: “agent access to [system] is prohibited until platform audit capabilities support SOC-2 evidence requirements” is the right framing. It signals that the prohibition is technical and temporary, not philosophical and permanent.

Customer relationship management

CRMs split based on the data flow analysis above. The pattern that works for most lightly-regulated firms:

Read access: broadly permitted, agents can pull contact data, deal histories, activity logs, and notes for synthesis purposes.

Write access to non-financial fields: permitted with logging. Notes, activity records, contact updates, follow-up tasks, qualitative annotations.

Write access to financial-feeding fields: prohibited or human-approval-gated. Deal stage advancement, commission allocations, forecasted close dates that feed revenue projections, anything that flows into board or investor reporting.

Document repositories

The classification depends entirely on what is in the repository. SharePoint, Google Drive, Box, Dropbox, OneDrive can each host both audit-relevant and routine documents.

The cleanest architectural pattern is separation: a structured set of folders that agents can access freely, and a separate set of folders (signed agreements, audit evidence, board materials, financial close documents) that agents cannot access at all. This is enforced through connector permissions or through the agent-side folder selection on tools like Cowork.

Trying to do this through document-level classification is a losing battle in mid-market firms. Folder-level enforcement is what works.

Email and messaging

Inbound triage, summarization, and draft preparation are low risk. Outbound sending is high risk because the agent is, in effect, signing the email as the user.

The pattern that works: agents read and synthesize email freely, draft responses for internal recipients with the user reviewing before sending, and never auto-send to external recipients without explicit per-message human approval. This is enforceable at the platform level on most current agent products.

Browsers

Browser-driven agents inherit the user’s logged-in sessions. When the agent opens a Chrome tab, it has access to whatever the user is currently authenticated to: the property management system, the bank portal, the CRM, the email client. The agent’s effective permission scope on browser actions is the union of every active session in the user’s browser.

This is the single most underestimated risk in current agent governance. It cannot be controlled at the agent platform layer. It must be controlled by user behavior policy and by browser-level isolation.

The recommended posture: provide users a separate browser profile for agent-driven tasks, with no authenticated sessions to in-perimeter systems. The user logs into their financial systems in their primary browser profile; the agent operates a clean profile that has only the credentials it needs for legitimate task work. This is friction, but it is the only effective control against the cross-application contamination problem.

MCP servers and plugins

The plugin and connector ecosystem is the real attack surface in agent deployments, and most governance frameworks understate it.

Every MCP server an agent connects to is, functionally, third-party code with access to whatever the agent can access. Vetting is essential. The right pattern is a curated marketplace: a small set of approved plugins and connectors, distributed via the platform’s enterprise marketplace function, with installation limited to that marketplace. Public, unvetted MCP servers are not connected.

This is also where ongoing governance work continues to live after the initial deployment. New plugins emerge, new connectors are released by SaaS vendors, internal teams build custom plugins for their workflows. Maintaining the curated marketplace is an ongoing function, not a one-time setup task.

The architecture that works

For a lightly-regulated firm deploying agent technology, the architectural recommendation has four layers.

Native platform controls

Most current agent platforms expose a meaningful set of controls on their enterprise tier: single sign-on integration, role-based access controls, group-based permissions, plugin/MCP allowlists, network egress controls, browser-control toggles, scheduled task governance. These should be configured tightly from day one.

The specific configuration pattern: SSO required, default network egress denied, plugin marketplace curated, browser control limited or disabled by default, scheduled tasks reviewed before activation, role-based access granting agent capability only to users who need it.

The Enterprise tier of the leading platforms is generally worth the price premium over Team-level pricing for any organization deploying at scale, because the granular controls are not available on lower tiers and retrofitting them later is painful.

Endpoint as the audit substrate

Current agent platforms generally store conversation history and local logs on the user’s machine, outside the platform’s centralized audit infrastructure. This is changing, but slowly.

In the interim, endpoint security tooling becomes the primary visibility layer for agent activity. Endpoint detection and response (EDR), full-disk encryption, mobile device management, and standard endpoint hygiene are not optional in an agent-deployed environment; they are the audit substrate. Any organization without mature endpoint controls should fix that before deploying agents broadly, not after.

Telemetry routing

Where platforms emit OpenTelemetry events for agent actions (Claude Cowork does, as do several others on enterprise tiers), routing those events to a SIEM or a basic log retention system is worthwhile. This is not full audit-grade logging yet, but it provides behavioral visibility: which connectors were called, which files were touched, which tools were used, whether actions were approved manually or automatically.

For lightly-regulated firms, even basic log retention into cloud storage with structured query access is sufficient. Heavy SIEM investment is overkill for this risk profile.

Network egress and gateway control

For most lightly-regulated firms, network egress controls at the agent platform level are sufficient: default deny, allowlist the actual SaaS perimeter the organization uses. Heavier MCP gateway infrastructure (TrueFoundry, MintMCP, Fastio) is an option for organizations that grow into needing centralized connector authentication and audit, but it is not a required starting point at this risk profile.

The architectural decision point is scale. At under a few hundred agent users with a stable connector set, native platform controls suffice. Above that, or with rapid plugin proliferation, gateway infrastructure starts paying for itself.

The two risks that most frameworks miss

Two risks deserve specific attention because they are systematically underweighted in published guidance.

The plugin supply chain

Every connector and plugin is third-party code with access to organizational data. The risk profile is closer to a software supply chain attack than to a traditional SaaS integration. A malicious or compromised plugin can read files, send messages, exfiltrate data, all under the legitimate user’s identity.

The mitigation is curatorial discipline. The plugin marketplace must be actively maintained, plugins must be vetted before approval, and there must be a documented process for adding new ones. This is governance work that does not end with the initial deployment; it is ongoing operational responsibility.

Browser session inheritance

Browser-driven agent actions inherit user sessions. This is worth restating because it is so frequently missed: an agent driving Chrome on a user’s machine has access to every system the user is currently logged into in that Chrome instance. There is currently no platform-level control that fixes this. The only effective mitigation is workflow design: dedicated browser profiles, explicit logout from in-perimeter systems before agent tasks, or disabling browser control entirely for users whose role requires authenticated access to sensitive systems.

This risk maps poorly to most existing security frameworks because it is neither a network risk nor an application risk in the traditional sense. It is an identity-and-context risk that emerges from the agent operating within the user’s session boundary.

The deployment pattern that generates ROI

Governance frameworks tend to focus on what should not happen. Equally important is the question of what should happen, and how to deploy in a way that captures the productivity upside that justifies the deployment in the first place.

The pattern that works for lightly-regulated firms is a two-layer deployment.

Org-level enablement

A two-to-three-week sprint to set up the platform correctly: SSO, RBAC, plugin marketplace, MCP allowlists, network egress, endpoint baseline confirmation, written acceptable use policy that names the perimeter, internal champions identified and equipped, the systems classification document drafted and approved.

This is fixed-scope, fixed-fee work that produces a stable foundation. It is also exactly the work that, done poorly or skipped, generates the audit findings and security incidents that derail later deployment.

Per-person automation cycles

The productivity dividend from agent technology is heavily idiosyncratic. Two people in the same role with the same systems access have different workflows, different friction points, and different highest-value automation targets. Generic role-based automation captures a small fraction of the available value.

The pattern that captures most of the value is consultant-led individual workflow analysis: ninety minutes shadowing an individual, identifying the three highest-frequency repetitive tasks, building a custom plugin or skill for the highest-value one, training the user, and returning two weeks later to refine and add the next.

Aggregating patterns across individuals into shared plugins is where the organizational leverage emerges. When the same automation pattern shows up in three or more people, it becomes a candidate for the curated marketplace, where it serves the whole organization.

Plugin productization

The highest-leverage activity in a mature agent deployment is the conversion of individual automations into shared organizational assets. A consultant who builds a market-comp synthesis plugin once for a single broker has done useful work; the same plugin deployed to twenty brokers has done twenty times the useful work. Plugin productization is where the deployment pattern compounds.

This is also the dimension that distinguishes thoughtfully-deployed agent capability from ad-hoc usage. Organizations that invest in this layer end up with a body of organizational IP (workflows captured as durable shared tools) that survives individual employee turnover and creates lasting productivity gains.

Quarterly tune-up

Workflows evolve, business needs shift, new systems come online, plugins age. A quarterly check-in per individual user (90 minutes, focused on what is working and what new automation opportunities have emerged) keeps the deployment vital. This is also the cadence at which the systems classification document should be reviewed, because organizational data flows shift over time and what was out-of-perimeter last year may have become in-perimeter through a new integration.

The audit gap is temporary, but mitigating it today may be worth the headache

The audit gap that currently constrains agent deployment in regulated workflows is a temporary state. Major platforms are investing in audit logging, compliance API integration, and SIEM-native event streams. Industry guidance from AICPA and SOC-2 auditors is evolving to address agentic AI specifically. Most observers expect platform audit capabilities to reach SOC-2 evidence requirements within the next two to four quarters.

Governance frameworks should be built with this trajectory in mind. The systems classification document should distinguish “agent-prohibited because of the workflow itself” from “agent-prohibited until platform audit capabilities mature.” The first category is durable; the second is transitional. Organizations that build this distinction in from the start can expand agent scope gracefully as platform capabilities catch up, without having to revisit fundamental policy positions.

The strategic posture is graduated expansion: deploy aggressively in Category 1 today, deploy with care and explicit controls in Category 2, plan deliberately for the migration of Category 3 workflows as platform capabilities permit. This is a defensible, auditable, and economically rational approach for lightly-regulated firms.

For mid-market firms with SOC-2 obligations and no heavier regulatory load, AI agent technology is not a question of whether to deploy. The productivity dividend is real and the technology is mature enough to capture it now. The question is how to deploy with discipline that preserves compliance posture while opening the substantial out-of-perimeter surface to genuine automation.

The firms that invest in explicit perimeter mapping, thoughtful systems classification, and ongoing governance discipline will outpace both the over-cautious (who freeze themselves out of the productivity shift entirely) and the reckless (who create audit findings and security incidents that consume any productivity gain in remediation cost).

The framework outlined here is one we deploy in client engagements; the mapping exercise typically takes a few days, the org-level enablement sprint takes two to three weeks, and the per-person automation cycles produce visible ROI within the first month of running. For a CIO or executive evaluating where to start, the first move is the perimeter map. Everything else flows from there.