The 60-Day Window
What changed in AI between February and April 2026, what got quietly more expensive, and why Klarna rebuilt the team it laid off. Three observations every operator should know.
Sixty days. That is how long it took for the AI industry to change shape three different ways at once.
The press covered the headlines. New model releases. Pricing pages. A retired product. Most operators reading the trade coverage missed the part that actually matters for businesses running operations on top of these tools.
Three things shifted between late February and late April 2026, and each one shows up in a different line on a mid-market operator's invoice or a different decision in a vendor pitch deck. We pulled the primary sources to map what actually changed, what it costs, and what to do about it.
This is what we found.
I. What Changed in the Last 60 Days
The market for AI capability is now sold as managed runtime, not raw access. Open-weight models from non-Western labs reached parity with closed Western frontier in under two weeks. And the bottleneck on the entire industry has migrated off compute and onto power.
Each of those shifts shows up downstream in the cost and reliability of any AI tool a business is running today.
Frontier capability is now priced per session, not per token
On April 8, 2026, Anthropic launched Claude Managed Agents in public beta with a billing model nobody had used before at this layer: tokens plus a flat $0.08 per agent-runtime-hour. The product ships with a sandboxed container, persistent memory, an ant CLI, and an Anthropic-managed loop. Anthropic's own platform documentation and the Wired coverage of the launch make this explicit.
Two weeks later, OpenAI released GPT-5.5 with a built-in computer-use tool, server-side compaction, hosted shell, and Agent Skills. AWS shipped AgentCore. Google shipped its Agentic Commerce Protocol with deep-research agents.
The pattern across all of them: the unit of billing moved from "tokens per call" to "session per hour" at the layer where the actual work happens. This is a deliberate move up-stack. Per-token pricing has been compressing for two years. Per-session pricing introduces a new ceiling that vendors control.
Open-weight models closed the gap to under two weeks
For most of 2024 and early 2025, the rule of thumb was that open-weight models trailed frontier closed models by six to nine months. In the four-day window from April 20 to April 24, 2026, that rule broke.
DeepSeek V4-Pro shipped April 24 with 1.6 trillion parameters in mixture-of-experts configuration, 49 billion active per inference, MIT license, and a native one-million-token context window. Hugging Face hosts the weights. The DeepSeek API documentation and Simon Willison's technical writeup confirm the configuration and licensing.
Kimi K2.6 from Moonshot AI shipped four days earlier on April 20, with one trillion parameters in MoE configuration, 32 billion active, modified MIT license, 256K context, and a 300-agent swarm orchestration mode. The model targeted long-horizon agentic coding and posted the highest SWE-Bench Pro score among open-weight models at release.
For a business running an AI tool today, the practical implication is simple: the model behind that tool may not be the model that was behind it last month, and the vendor may not have told you. Procurement clauses that assumed model identity was stable are now stale.
The binding constraint is power, not silicon
The most important number from the last 60 days is not a model benchmark. It is $329.17 per megawatt-day.
That is the price at which the PJM Interconnection's 2026/2027 base capacity auction cleared in late 2025. The clearing price is the FERC-imposed cap. Without the cap, the auction would have cleared near $530 per megawatt-day. The 2024/2025 auction cleared at $28.92 per megawatt-day. That is a 9.3x price increase in one auction cycle. PJM's own market monitor identifies data centers as 45% of the $47.2 billion in capacity costs across the last three auctions.
This is the bill that lands in residential utility statements over the next four years. The PJM-published numbers and Utility Dive's coverage of the market monitor show data centers as the primary driver of price formation, with residential rates projected to rise 15 to 25 percent through 2030.
For mid-market operators, the operational consequence is that the cost of running cloud-based AI tools is being subsidized today by capital-markets-funded data center buildout that is now politically priced into utility rates. That subsidy is finite. The next four hyperscaler quarterly prints from Microsoft, Google, Amazon, and Meta are the load-bearing data points on whether the build keeps pace.
II. What Just Got More Expensive (and Nobody Told You)
The unit cost of running AI rose meaningfully in the last 60 days. Not in the headline pricing pages. In the cache mechanics, the tokenizers, and the tier introductions.
If you are paying a vendor for AI-powered tooling, four changes in the last 60 days affect what you actually pay per task.
The Anthropic cache TTL silent change
On March 6, 2026, Anthropic silently changed the default time-to-live on prompt caching in Claude Code from one hour to five minutes. There was no public release note. Within days, developers documented cost increases of 17 to 32 percent on identical workloads. The byteiota technical writeup, the dev.to writeups, and the open issue against Anthropic's claude-code repository all reproduce the same regression.
Anthropic shipped a fix on April 10 in Claude Code version 2.1.101 and published an engineering post-mortem on April 23 explaining the change as an unintended interaction between an idle-session optimization and the cache layer. The fix is opt-in. To restore the longer cache window, the environment variable ENABLE_PROMPT_CACHING_1H=true must be set.
If your vendor uses Claude under the hood and has not updated to the patched Claude Code release, you are still paying the regression.
GPT-5.5 doubled the headline API price
GPT-5.5 was released in ChatGPT on April 23 and in the API on April 24, 2026. Pricing doubled versus GPT-5.4. The new rate is $5 per million input tokens and $30 per million output tokens, up from $2.50 and $15. GPT-5.5 Pro is priced at $30 per million input and $180 per million output, a separate reasoning-priced tier. OpenAI's developer documentation confirms the rates and a 2x input multiplier on context windows above 272,000 tokens.
This is the first frontier API price increase in a multi-year compression cycle. Vendors will pass it through. Procurement teams that priced their AI budget against the GPT-5.4 rate are now under-budgeted by 100 percent on output tokens.
The Opus 4.7 tokenizer
Anthropic released Claude Opus 4.7 on April 16, 2026. The headline pricing on the model is unchanged at $5 per million input and $25 per million output. The tokenizer changed.
Anthropic's own news release indicates the new tokenizer produces approximately 35 percent more tokens for the same text. The character count of a document is unchanged. The billed token count is not. A tool that processed a 10,000-character contract for one fixed price under Opus 4.6 now processes the same contract at meaningfully higher cost under Opus 4.7.
Same input. More tokens. More invoice.
What is quietly retiring
The other half of the cost-shift story is migration risk. Three product retirements landed in the last 60 days that any operator with a dependency needs to plan for now.
- Cerebras standalone API: deprecated May 27, 2026. Access shifts to partner APIs (OpenRouter and others). Cerebras' own pricing page confirms the deprecation date.
- OpenAI Sora 2 web and app: discontinuing April 26, 2026 (today, the day this is published). The OpenAI help center documents the sunset.
- OpenAI Sora 2 API: ends September 24, 2026. Anyone with a content pipeline that depends on Sora 2 has five months to migrate to Veo 3.1, Kling 3.0 Omni, or another video generation provider.
Ask your AI vendor three things this quarter: which model is in their stack today, what has changed in that model's pricing or tokenizer or cache behavior since you signed, and what underlying APIs are scheduled to retire in the next 12 months. If they cannot answer, that is the answer.
III. The Replacement Lie
Every operator running a mid-market business has been pitched "replace your team with AI" at least once in the last 18 months. The data from Q1 2026 says the companies that did this in 2024 are quietly reversing the decision. Meanwhile, three specific industries are getting real ROI from AI in production. They are not the ones that fired their teams.
What stopped working
Klarna is the canonical case. The Swedish fintech publicly replaced approximately 700 customer service agents with AI between 2022 and 2024 and built a corporate narrative around the move. In early 2026, CEO Sebastian Siemiatkowski admitted in public statements that the cuts "went too far." Klarna is now hiring customer service roles back. The reversal was covered in CNBC, MLQ, and analyzed in detail by DigitalApplied in early 2026.
Klarna is not an outlier. An Orgvue survey of 1,100 C-suite executives published in 2025 found that 55 percent of companies that did AI-driven redundancies regret the decision. JLL, the commercial real estate services firm, published its 2026 reality-check report showing that the percentage of companies claiming AI created "transformative impact" in their operations dropped from 12 percent in 2025 to 1 percent in 2026.
The MIT NANDA study published in mid-2025 reported that 95 percent of generative AI pilots fail to scale. The methodology has been contested, but the directional finding aligns with McKinsey's broader 2026 organizational AI report: 88 percent of organizations use AI, but two-thirds have not begun the scaling phase.
The pattern is consistent. The thing that does not work is treating AI as a labor-replacement subscription.
What did work
Three industries have crossed from pilot to production with measurable economic outcomes in the last twelve months. They are visible in the data, named, and reproducible.
Healthcare ambient documentation. The Cleveland Clinic deployed Ambience Healthcare's ambient AI scribe to more than 4,000 clinicians. Documented outcome: 14 minutes per clinician per day saved on EHR documentation, with an associated 46 percent improvement in sepsis case identification through Bayesian Health's adjacent deployment. The American Hospital Association's April 2026 reporting and Healthcare IT News coverage both name the deployment, the population, and the time savings.
Heavy-industry predictive maintenance. Siemens Senseye predictive-maintenance software, deployed at the company's flagship plants and at Sachsenmilch's dairy operation, produced a documented 20 percent throughput gain, 30 percent failure reduction, 99.9 percent quality, 15 percent energy reduction, and $35 million per year in savings per flagship plant. The Arm Newsroom and Siemens Newsroom both published detailed case studies in 2025 with reproducible operational metrics.
BigLaw contract review. A&O Shearman, the merged firm formerly known as Allen & Overy and Shearman & Sterling, deployed Harvey across 4,000 lawyers in 43 jurisdictions. The firm's own press release reports lawyers saving 2 to 3 hours per week, a 30 percent reduction in contract-review time, and 7 hours saved on complex documents. Thomson Reuters' CoCounsel reached 1 million users across 107 countries in February 2026 (Thomson Reuters press, LawNext analysis).
What the three winners share
The pattern across the three industries that worked is unambiguous, and it is not what 2024-era AI sales decks promised.
- AI augmented a named workflow. It did not replace headcount. Cleveland Clinic still has clinicians. A&O still has lawyers. Siemens still has technicians. The throughput per existing seat went up. Headcount did not go down.
- Tight feedback loops with the operator running the workflow. The clinician edits the ambient note before it lands in the chart. The lawyer reviews the contract markup. The technician validates the predicted failure. The human stays in the loop on the unit of work.
- ROI is measured in throughput, error rate, and cycle time per existing seat. Not in heads removed. Klarna's 2024 narrative measured success in heads removed. By 2026, that metric did not survive contact with customers.
- Implementation took months to years of integration work. Not a chatbot subscription. Cleveland Clinic's own published guidance on ambient AI deployment emphasizes evaluation, integration, and scale as separate phases over a multi-quarter horizon.
- Oversight and compliance functions stayed intact. No production deployment in the three winning industries removed the audit, compliance, or supervisory layer. The American Medical Association documented in 2026 that AI-driven prior-auth denials are increasing physician concern, which is the opposite signal: deployments that remove oversight are creating new operational pain.
What the broader data says
Goldman Sachs' Q1 2026 earnings commentary, summarized in Fortune on March 3, 2026, found "no relationship between AI and productivity" at the firm-wide level, but a 30 percent productivity boost in two specific use cases. JPMorgan CEO Jamie Dimon, on the same earnings cycle, explicitly cautioned that deploying AI to improve efficiency ratios is not strategically rational because competitors will do the same and the benefit flows to the marketplace, not the firm.
Both of those statements describe the same reality. AI generalized as a productivity solvent does not work. AI deployed against a specific, well-instrumented, named workflow with the operator in the loop produces measurable economic results.
What This Means for an Owner Running Operations
Three takeaways. Each one maps to one of the three sections above.
One. The AI market is restructuring under your tools. Frontier capability moved up-stack into per-session billing. Open-weight models reached parity in two-week windows. Power, not chips, is now the binding cost driver. Ask your vendor what changed in their stack since you signed and what underlying APIs are scheduled to retire in the next 12 months. If they do not know, your infrastructure is on a clock.
Two. The cost lever your vendor controls is moving silently against you. Cache TTLs, tokenizers, tier introductions, and product retirements all happened in the last 60 days, and most of them did not announce themselves on the pricing page. Audit your invoice line items. Compare token consumption per task this quarter against last quarter. If the per-task cost rose without your usage rising, the change came from the vendor side, and you are entitled to ask why.
Three. The use cases that actually work in 2026 are not the ones that were pitched in 2024. Replacement is regret. Three industries proved that augmentation in named, instrumented, oversight-preserving workflows produces real ROI. Every one of those deployments took months to integrate, kept the operator in the loop, and measured outcomes in throughput per existing seat, not heads removed. The companies winning the next 18 months are the ones building AI systems for stewardship and accountability, not the ones whose contracts get marked up every time their vendor changes a default.
The promise to mid-market businesses two years ago was that AI would replace work. The data from April 2026 says it does not, with three named exceptions, and the cost of running it is rising while the ground under it shifts.
Sixty days is a short window for an industry. It is a long window for a business that pays its AI bills monthly.
Sources
Section I · what changed in 60 days
- Anthropic Claude Managed Agents launch (Apr 8, 2026): platform.claude.com/docs/en/managed-agents/overview; Wired coverage
- DeepSeek V4-Pro and V4-Flash release (Apr 24, 2026): huggingface.co/deepseek-ai/DeepSeek-V4-Pro; api-docs.deepseek.com; simonwillison.net Apr 24, 2026
- Kimi K2.6 release (Apr 20, 2026): huggingface.co/moonshotai; marktechpost.com Apr 20, 2026
- PJM 2026/27 capacity auction clearing price ($329.17/MW-day): PJM Inside Lines, "PJM Auction Procures 134,479 MW of Generation Resources"; PJM news release December 17, 2025
- PJM data-center share of capacity costs: Utility Dive, "Data centers were 40% of PJM capacity costs in last auction: market monitor" (Jan 7, 2026); Utility Dive, "Data centers 'primary reason' for high PJM capacity prices: market monitor"
Section II · what got more expensive
- Anthropic cache TTL silent regression (Mar 6, 2026): Anthropic engineering post-mortem (anthropic.com/engineering/april-23-postmortem); GitHub issue anthropics/claude-code#46829; byteiota.com analysis ("Anthropic Cache TTL Downgrade: Silent $2.5K Cost Spike"); dev.to writeup; xda-developers.com coverage
- GPT-5.5 release and pricing (Apr 23-24, 2026): openai.com/index/introducing-gpt-5-5; developers.openai.com/api/docs/pricing; openrouter.ai/openai/gpt-5.5; en.wikipedia.org/wiki/GPT-5.5
- Claude Opus 4.7 release (Apr 16, 2026): anthropic.com/news/claude-opus-4-7
- Cerebras standalone API deprecation (May 27, 2026): cerebras.ai/pricing
- OpenAI Sora 2 web/app sunset (Apr 26, 2026) and API end (Sept 24, 2026): help.openai.com Sora 1 sunset FAQ; help.openai.com Sora 2 transition guide
Section III · the replacement lie
- Klarna AI customer-service reversal: DigitalApplied, "Klarna Reverses AI Layoffs: Replacing 700 Workers Backfired" (digitalapplied.com/blog/klarna-reverses-ai-layoffs-replacing-700-workers-backfired); CNBC reporting on Siemiatkowski's 2026 statements
- Orgvue C-suite regret survey (n=1,100): published 2025; cited by DigitalApplied and multiple operations-research outlets
- JLL real-estate AI reality check (12 percent to 1 percent year-over-year): jll.com/en-us/newsroom/real-estates-ai-reality-check-companies-piloting-only-achieved-all-ai-goals
- Cleveland Clinic Ambience deployment: AHA "6 Health Systems Enhancing Care Delivery with Ambient AI Scribes" (Apr 14, 2026, aha.org); Healthcare IT News, "Cleveland Clinic offers tips on ambient AI deployment"
- Siemens Senseye predictive maintenance case studies: Arm Newsroom, "Siemens Reinvents Factory Reliability with Edge AI Predictive Maintenance"; Siemens Newsroom
- A&O Shearman Harvey deployment: aoshearman.com press release on agentic AI rollout
- Thomson Reuters CoCounsel 1M-user milestone: LawNext, Feb 25, 2026, "CoCounsel Reaches 1 Million Users"
- Goldman Sachs Q1 2026 AI productivity finding: Fortune, Mar 3, 2026, "Goldman finds no relationship between AI and productivity but a 30 percent boost for 2 specific use cases"
- JPMorgan Q1 2026 AI commentary: JPMorgan Q1 2026 earnings call transcript (Motley Fool)
- McKinsey 2026 organizational AI report: McKinsey & Company analysis cited across enterprise-AI press
- MIT NANDA pilot-failure finding (95 percent) and contested methodology: Fortune, Aug 18, 2025, "MIT report: 95 percent of generative AI pilots failing"; Futuriom counter-analysis, "Why we don't believe MIT NANDA's weird AI study"
Methodology
This post is a synthesis of a multi-sector intelligence sweep run by Ena Pragma on April 26, 2026, covering 12 sectors of the AI industry across 752 primary sources. Every load-bearing claim above maps to one or more of those primary sources. The full methodology is the cold-start external research loop that Ena Pragma uses to produce decision-grade intelligence for mid-market operators.
Ena Pragma builds always-on systems for mid-market businesses navigating this transition. We don't replace your team. We replace the manual work that's leaking out of your operation. Get in touch.