← Back to Insights
Data Governance

How Block Uses DataHub's MCP Server to Power AI Agents for Data Governance at Scale

Joshua Garza

Scrabble tiles with Cyrillic letters spelling 'верь' displayed on a wooden surface. Photo by Polina Zimmerman on Pexels

Block's production deployment of DataHub's MCP Server with the open-source Goose agent framework demonstrates that AI-driven data governance at enterprise scale is no longer theoretical — it is an architectural pattern built on open standards that any mature data organization can adopt today to close the persistent gap between governance intent and governance execution.

The Governance Execution Gap

a path through a park Photo by Kotryna Juskaite on Unsplash

Every mature data organization knows the pattern: governance policies get written, approved, and published — then enforced inconsistently, if at all. The gap between governance intent and governance execution widens with every new platform, warehouse, and pipeline added to the ecosystem. Human teams cannot keep pace with the rate at which data assets proliferate. The problem is structural, not motivational. Teams lack the bandwidth to enforce rules uniformly at scale.

Block's production deployment of DataHub's MCP Server with its open-source Goose agent framework is the clearest evidence yet that AI agents can close this gap operationally. This is not a conference demo. It is an architectural pattern running at enterprise scale — and it carries direct implications for how senior data leaders should evaluate their governance tooling strategy today.

The Integration Problem MCP Was Built to Solve

the inside of a car that is being displayed Photo by Robert Schwarz on Unsplash

Before examining Block's specific implementation, it helps to understand the foundational problem that made it possible. Enterprise data ecosystems are federations — warehouses, streaming platforms, transformation layers, observability tools, access control systems — each with its own API and metadata schema. Human analysts already struggle to maintain context across these boundaries. AI agents face the identical fragmentation problem, only faster.

Building bespoke connectors for every tool-agent pairing is unsustainable. The Model Context Protocol (MCP) addresses this directly: an open standard for agent-to-tool communication. DataHub's MCP Server implements it, exposing metadata search, lineage traversal, governance policy access, tagging, classification, and data operations through a single standardized interface — making the entire catalog accessible to any MCP-compatible agent without custom integration work.

Block's Implementation: Goose Meets DataHub

brown wooden blocks on white surface Photo by Brett Jordan on Unsplash

With MCP providing the integration layer, Block was positioned to put this architecture into practice at significant scale. Block operates one of the most complex fintech data ecosystems in the world, spanning Square, Cash App, Afterpay, and TIDAL across multiple regulatory jurisdictions. Governing data consistently at that scale demands automation.

Block built Goose, an open-source AI agent framework designed to execute multi-step workflows autonomously by invoking tools through standard interfaces. Connected to DataHub's MCP Server, Goose agents query the metadata catalog, retrieve governance policies, inspect lineage, apply classifications, and flag compliance issues — without human-in-the-loop for routine operations.

This inverts the traditional governance model: agents execute well-defined tasks, humans review outcomes.

Technical Architecture: How MCP Exposes DataHub to Agents

A close-up of hands holding a cardboard sign with the text 'What Now?' Photo by Jeff Stapleton on Pexels

Understanding why this works requires a closer look at what happens beneath the surface. DataHub's MCP Server exposes discrete, callable tools — catalog search, entity detail retrieval, tag and glossary term read-write, data contract access, lineage graph navigation — that agents discover dynamically through the Model Context Protocol.

The agent reasoning loop is straightforward: identify the required tool, invoke it via MCP, process the returned context, then act or request additional information. When DataHub exposes new tools, agents gain new capabilities automatically — no code changes required.

Critically, governance applies to the governance agent itself. Authentication, access control, and audit logging are enforced at the DataHub platform layer, meaning agents operate within the same permission boundaries as human users, with a complete audit trail.

Why This Matters for Enterprise Data Ecosystems

Wooden letter blocks spelling 'Feedback' on a wooden grid surface. Photo by Ann H on Pexels

The technical architecture is sound, but the strategic implications are what should command attention from senior data leaders.

Scale. Manual governance workflows break under load. Classification reviews, access audits, lineage validation, and policy application multiply with every new dataset and pipeline. Agents absorb that volume without degradation.

Consistency. An agent applies identical logic on every execution — no drift, no interpretation variance. In regulated industries, that mechanical consistency produces audit-defensible governance records that human-driven processes struggle to match.

Integration cost reduction. MCP standardization means any MCP-compatible agent can consume DataHub's governance capabilities without custom connector work — a decisive advantage for ecosystems spanning dozens of platforms.

The Signal for the Future of Data Governance

Symmetrical abstract forms are shown in black and white. Photo by Quentin Martinez on Unsplash

These advantages at Block point toward a broader industry shift. Block's deployment is an early but production-grade indicator of where governance is heading: from periodic, human-operated policy enforcement to continuously automated capability. The critical detail is the open-standards foundation. MCP is not proprietary. DataHub is open-source. Goose is open-source. This pattern is reproducible without vendor lock-in.

The near-term recommendation is straightforward: MCP compatibility belongs on every organization's requirements list when evaluating catalog and governance platforms.

The longer-term shift is structural. Governance stops being a cost center that scales linearly with headcount and becomes a capability that scales with the data ecosystem itself — fundamentally changing the ROI calculus for governance investment.

Conclusion

Inspirational image with 'Support Small Businesses' text on a warm yellow background. Photo by Thirdman on Pexels

Block's deployment proves that agent-driven data governance is technically feasible and operationally viable today — built entirely on open standards and open-source tooling. The architectural pattern is clear: a metadata platform with MCP exposure, an agent framework capable of multi-step governance workflows, and the organizational willingness to let agents execute while humans review.

For senior data leaders managing distributed ecosystems at scale, the question is no longer whether this approach works. It is how quickly your organization can put these foundations in place before the gap between governance ambitions and governance reality becomes a competitive and regulatory liability.