The End of the Consulting Pyramid
AI doesn't make the consulting pyramid more efficient. It puts the pyramid under structural pressure it was not designed to absorb. The firms that adapt earliest will not optimize the old model — they will begin replacing it.
Part of the Phase III — Decision series
By Michael E. Ruiz
The previous two essays in this series examined what the consulting delivery model looks like when observed closely, and what happens to its economics when production work becomes predictable and separable from judgment. The conclusion was structural: a model built around bundled labor, priced by the hour, and scaled through headcount is under pressure it was not designed to absorb. This essay addresses the question that follows: what does the next delivery structure look like?
The Separation Is Underway
The unbundling of expertise from production is not a future event. It is already visible in how advisory work is being delivered at the margins of the industry. Boutique firms and specialized advisory practices are operating with materially smaller teams, faster delivery cycles, and higher ratios of senior-to-junior involvement per engagement. They are not doing less work. They are doing the same work with a different delivery architecture — one in which AI handles the production substrate and humans focus on the activities where judgment cannot be automated: problem structuring, interpretation, recommendation, and client relationship.
The large firms have noticed. Most are responding by deploying AI tools into the existing pyramid — making analysts faster, compressing some timelines, capturing incremental margin improvement. This is a rational first step. It does not require organizational change, does not challenge the economics that currently run the firm, and produces visible returns on a familiar structure.
But optimization within the existing model does not address the structural question, which is whether the pyramid remains the right shape when the production layer it was designed around is no longer labor-dependent.
The Transition
The shift will not happen overnight. Most firms will operate in hybrid models for a period of time — maintaining elements of the pyramid where client expectations or delivery complexity require it, while selectively compressing the production layer in areas where AI delivers equivalent or better output. The transition is not about removing structure. It is about redesigning where human effort is applied and where it is no longer required.
The firms that navigate this transition well will be the ones that treat it as a design problem rather than a cost problem. Reducing headcount without redesigning the delivery architecture produces a thinner pyramid, not a different model. The structural response requires rethinking not just how many people are involved, but what roles they play, how they are organized, and how the firm prices what it delivers.
The Diamond Model
The structure that emerges from this redesign is a Diamond. It is not a compressed pyramid. It is a different shape, built on different assumptions about where value is created and what role AI plays in delivery.
The Diamond has four layers:
Managing Director — Client relationships, strategy, and accountability for outcomes. This layer is unchanged in function. The scarce resource remains judgment, trust, and the ability to translate complexity into a decision the client can act on. What changes is the economics: fewer managing directors are needed per dollar of revenue because the delivery structure beneath them is substantially more leveraged.
Delivery Orchestration — A small coordination layer that manages execution. This is not project management in the traditional sense. It is orchestration: sequencing AI workflows, managing quality checkpoints, coordinating domain experts across engagements, and maintaining delivery velocity. This layer requires people who understand both the substance of the work and the capabilities of the tools.
Domain Expertise — Deep specialists applied across engagements. Not generalist consultants who rotate across industries and problem types, but people with genuine depth in a specific domain — operational technology security, healthcare transformation, enterprise AI architecture, regulatory compliance. In the Diamond, domain expertise is not a stage in a career progression toward generalist partnership. It is the core intellectual asset of the firm and the center of gravity of the delivery model.
AI Systems / Agents — Analysis, research, synthesis, and production at scale. This is the foundation of the Diamond, and it is infrastructure rather than headcount. AI systems that execute structured workflows, generate first-draft outputs, process large information sets, and surface the inputs that domain experts and managing directors need to do their work. This layer compresses what was previously the analyst pyramid — not by making analysts faster, but by replacing the need for analyst-scale labor in production.
The Judgment Premium
Compressing hours does not compress the value of expertise. The Diamond must be priced differently from the pyramid, and the distinction matters.
The traditional T&M model measures effort. When production effort compresses, the invoice shrinks — even when the outcome is identical. A senior advisor who uses AI to produce in three days what previously took three weeks has not delivered less value. The analysis is the same. The recommendation is the same. The client outcome is the same. But a time-and-materials invoice for three days looks like a reduction in deliverable scope.
This is where the judgment premium becomes the central economic argument. A $200 monthly AI subscription does not substitute for domain expertise. It is a production tool that is only as good as the person directing it. The quality of what AI produces is bounded by the quality of how the problem is framed — which requires understanding the problem deeply enough to know what to ask. And the value of the output is bounded by the ability to evaluate whether what came back is correct, complete, and actionable — which requires domain depth to validate.
The scarce resource in the Diamond model is not production capacity. It is the judgment required to ask the right question and recognize the right answer.
The billing model follows. Firms operating a Diamond structure need to shift from pricing hours to pricing outcomes — not outcome-based pricing in the abstract, but a deliberate move toward fees that reflect the value of the decision supported, the risk reduced, or the capability delivered. The Diamond firm that continues billing T&M will systematically undervalue its own output. The one that reprices around judgment and results will capture the economic benefit of the compression AI creates.
What This Enables
The Diamond changes the competitive dynamics of advisory services in ways the pyramid could not support.
A well-configured Diamond firm can cover engagement scope that previously required significantly larger teams. It can deliver analysis in a fraction of the time, not by cutting corners but by removing the production bottleneck that made the timeline long in the first place. It can maintain depth of expertise that large generalist firms struggle to replicate, because their pyramid economics require staffing generalists rather than investing in specialists.
Delivery becomes more consistent because AI systems execute structured workflows reliably. Expertise becomes more concentrated because the firm invests in domain depth rather than analyst breadth. Client relationships become more direct because the managing director is operating a focused delivery engine rather than managing a large hierarchical team.
The boutique model was always constrained by reach. A small firm could not cover the analytical ground required to compete with large firms on comprehensive scope. The Diamond changes that constraint. The leverage that was once the exclusive structural advantage of the large firm becomes available to any firm willing to redesign how it delivers.
What Most Firms Will Do First
Most firms will initially respond by optimizing the existing model rather than redesigning it. That is a rational first step. Deploying AI tools to increase analyst productivity, capturing margin improvement through attrition, and maintaining existing delivery structures while building familiarity with new workflows — this is manageable change.
But over time, the pressure compounds. As AI capability matures and clients develop a clearer understanding of what can be produced without large teams, the economic case for maintaining the pyramid in its traditional form becomes harder to sustain. Revenue per project declines while margin per project increases — and the firms that have already begun experimenting with Diamond-style structures will be positioned to scale through throughput rather than headcount.
The transition is not about predicting exactly when the pyramid becomes unviable. It is about building toward the next model before the current one forces the question.
Building Toward It
This shift is already underway. The firms moving earliest are small by design, not by constraint. They are building for leverage rather than scale, for domain depth rather than broad coverage, and for AI-native delivery rather than AI-augmented delivery.
The difference between those two orientations is not a technology question. It is a design question — about structure, pricing, talent, and how the firm creates value when production is no longer the bottleneck. The firms that answer it clearly in the next two to three years will be operating with structural advantages that are difficult to replicate quickly from inside a traditional pyramid.
This site documents what that shift looks like in practice — not as a theory, but as an operating model being built and tested.
These ideas are available as keynote presentations and executive briefings. Explore speaking topics →