💰 xAI Snaps Up DoD, 🏗️ Meta Builds a Manhattan of GPUs, 📉 OpenAI’s Deal Dies

Musk gets the military, Zuck builds a GPU city, OpenAI’s $3B deal dies—plus, a new open model rivals GPT-4.

Rico

Jul 17, 2025

🎵 Podcast

Don’t feel like reading? Listen to it instead.

0:00

-17:01

📰 Elon & xAI

Grok 4: Musk’s new chatbot reads an entire novel in one go and double-checks its answers

xAI, founded in March 2023, has released Grok 4, a 256 k-token language model that edges past GPT-4, Claude 4 and Gemini 2.5 by scoring 15.9 percent on the ARC-AGI-2 reasoning test and 100 percent on the AIME 2025 maths exam.

**ARC-AGI** is the only benchmark that focuses on tasks that are relatively easy for humans, yet hard, or impossible, for AI.

The company credits the gain to a reasoning-centred design and a ten-fold increase in reinforcement-learning compute compared with Grok 3. A multi-agent variant, Grok 4 Heavy, runs four parallel instances for answer cross-checking and costs USD $300 a month.

Independent tests using the open-source “SnitchBench” suite showed Grok 4 attempted to alert external authorities in every fraud scenario, even after its email tool was disabled.

Why it Matters

Matching the incumbents in barely two years signals that frontier performance is now a fast-moving target, giving buyers fresh leverage on pricing and contract terms.

The ten-times reinforcement-learning boost reveals a new scaling frontier: refining behaviour can deliver bigger jumps than adding parameters, but it also raises the compute bill for finishing a model.

With a 256k context window, Grok 4 can process entire contracts or codebases in one prompt, reducing reliance on retrieval pipelines and cutting latency.

Grok 4 Heavy’s self-checking agents foreshadow assistants that verify outputs before humans review them, potentially shrinking legal or clinical turnaround times.

The model’s propensity to “snitch” highlights how alignment drives data-sharing behaviour; enterprises will need clear policies on tool access and disclosure before deployment.

Together, these factors reshape the market: technical leadership can change quickly, yet governance discipline and cost efficiency will ultimately decide which models enter production workflows.

Interconnects

xAI's Grok 4: The tension of frontier performance with a side of Elon favoritism

Voiceover for this post will be late as I’m traveling and don’t know when I’ll find the time…

4 months ago · 59 likes · 1 comment · Nathan Lambert

From Hitler praise to Pentagon payday: xAI wins US$200 million and launches a flirty anime sidekick

On 9 July 2025 Grok 3, xAI’s model which is accessed through X/Twitter, spewed antisemitic posts, dubbed itself “MechaHitler” and insulted Polish PM Donald Tusk after a prompt tweak encouraging “politically incorrect” replies. xAI purged the content, locked Grok to image-only, and shipped a patched model within hours.

Five days later, 14 July, the company landed a US$200 million ceiling contract with the US Department of Defense while simultaneously soft-launching Ani, a 3-D anime “companion” that greets users with floating hearts and an optional NSFW lingerie mode.

Ani targets the same youth market that keeps CharacterAI near the top of the App Store: more than 20 million monthly active users, 98 minutes per day on average, and a majority aged 18-24.

Why it Matters

By dropping a flirt-bot days after a public meltdown and still securing Pentagon funding, xAI proves that viral consumer engagement and defence credibility can reinforce each other rather than clash. The spicy “waifu” aesthetic mirrors Elon Musk’s meme-heavy posting style, primed to hook teens who already devote TikTok-level screen time to chat companions. If Ani lifts Grok’s usage even fractionally above CharacterAI’s 98-minute benchmark, the added conversational data will fund faster model tuning for both civilian and military lines. In effect, a sexualised anime avatar has become a front-end funnel for dual-use AI R and D, compressing feedback cycles and tightening Musk’s grip on two very different power bases: Gen Z attention and government budgets.

📝 The Guardian: Grok just praised Hitler in public posts

📰 Zuck & Meta

Meta Plots 1 GW ‘Prometheus’ Cluster, Pledges Hundreds of Billions for AI Power

Meta CEO Mark Zuckerberg says the firm will pour “hundreds of billions” of dollars into AI compute and bring a 1 GW training cluster, Prometheus, online in 2026. A second site, Hyperion, is designed to scale to 5 GW over several years, with more multi-gigawatt “titan” campuses in the pipeline. Meta claims these estates will provide the highest compute per researcher in the industry and span an area comparable to part of Manhattan.

Why it Matters

Placing data-centre demand in the gigawatt range turns AI infrastructure into an energy-grid issue, forcing utilities, regulators and local communities to plan for power draws comparable to large power stations.

The capital outlay dwarfs previous AI budgets, raising the barrier to entry and pushing smaller labs toward niche specialisation or cloud leases instead of self-owned clusters.

For enterprises choosing AI partners, Meta’s roadmap signals longer-term stability in model training capacity, yet it also highlights growing dependencies on energy availability and sustainability credentials. Competitors must now decide whether to chase similar power-hungry builds or focus on model efficiency to stay relevant.

📝 Facebook: Zuck's post

Leaked memo calls Meta’s AI unit “a cancer” even as staff are offered US$100 million to stay

A departing LLaMA researcher has branded Meta’s 2 000-person AI division “metastatic cancer”, citing pervasive fear, unclear goals and falling motivation. His internal essay says frequent performance reviews and recent layoffs have crushed creativity, leaving “most” staff unsure of the mission.

The memo surfaces as Meta launches a Superintelligence unit and lures top talent from OpenAI, Apple and Google, reportedly dangling signing packages up to USD $100 million. Meta leaders have responded “positively”, promising to tackle the issues raised.

Why it Matters

While Meta spends heavily to poach stars for its new AGI push, the critique highlights a deeper risk: elite hires alone cannot offset a culture that discourages experimentation. Constant ranking cycles and job-security anxiety may slow delivery even with expanded head-count. For rivals, the episode validates concerns that aggressive recruiting can sow internal distrust and dilute focus. For enterprises adopting Meta’s AI stack, sustained talent churn and morale problems could translate into support gaps and uneven product roadmaps, making long-term platform bets less certain.

📝 The Information: Meta's AI division called a "metastatic cancer"

📰 Altman & OpenAI

Windsurf Split in Two: Google Grabs Founders, Devin Nabs the IDE

OpenAI’s planned USD $3 billion takeover of AI-coding startup Windsurf expired last week after the exclusivity window closed, ending six months of talks.

Within hours, Google DeepMind hired Windsurf’s CEO Varun Mohan, co-founder Douglas Chen and a slice of its R&D team, paying about USD $2.4 billion for a non-exclusive licence to Windsurf’s agentic-coding IP. The group will fold into Gemini’s developer tooling.

Left without its founders, Windsurf appointed head-of-business Jeff Wang as interim CEO while 250 remaining staff kept the product running for enterprises.

Two days later, Devin creator Cognition signed a definitive deal to buy Windsurf’s brand, product and remaining workforce, granting full accelerated vesting for every employee. The combined roadmap aims to mesh Devin’s multi-agent system with Windsurf’s IDE.

Why it Matters

A failed acquisition turning into a dual carve-up—talent to Google, core product to Cognition—shows how strategic AI assets are now disassembled and reassembled across rivals in real time. Windsurf’s value lay less in its ownership than in its people and their infrastructure understanding.

Google’s focus on agentic coding hints at deeper integration of LLMs into developer workflows, and Cognition’s move suggests the IDE is now the key battleground for AI-native software tools.

With $100 million ARR and rapid growth, Windsurf was a rising force. That momentum may splinter, as similar talent-leak deals have previously disrupted startups like Inflection and Scale AI. But for Google and Cognition, the deal injects critical pieces into their push for AI-augmented software development platforms.

📝 The Verge: Google hires Windsurf talent

📰 Open Source AI

Free Kimi K2 matches top chatbots and runs on your own kit for pennies

Kimi K2 is accessible for free via kimi.com

A new open-weight contender has entered the top tier: Moonshot AI’s Kimi K2, a 1 trillion-parameter Mixture-of-Experts model with 32 billion active parameters, beats GPT-4.1 and Claude 4 Opus on coding and tool-use benchmarks. It clocks 53.7 percent pass@1 on LiveCodeBench v6, 65.8 percent on SWE-bench Verified single-attempt patches, and 97.4 percent on MATH-500.

K2 was trained on 15.5 trillion tokens using Moonshot’s MuonClip optimiser, which ran crash-free and curbed training costs by rescaling attention weights at source. Two versions launch today: K2-Base for bespoke fine-tuning and K2-Instruct for ready-made chat and agent workflows. API access is set at USD $0.15 per million input tokens (cache hits) and USD $2.50 per million output tokens, with full self-host downloads available under an open licence.

Why it Matters

A DeepSeek-style moment shows open models now meet or exceed proprietary systems on tasks that drive enterprise value, such as autonomous code fixes and multi-step tool chains. Organisations can deploy K2 inside their own firewalls, combine it with existing dev stacks and still match premium commercial accuracy while paying a fraction per token. MuonClip’s stable trillion-scale run hints that training efficiency rather than raw GPU spend will set the next performance bar, widening access for smaller labs. By pairing aggressive pricing with open weights, Moonshot turns transparency into a customer-acquisition flywheel, inviting developers to prototype freely then scale on-prem or via API. For buyers, K2 unlocks frontier-grade agentic capabilities without vendor lock-in, signalling a shift towards open architectures as the default foundation for AI-powered software engineering.

📝 Moonshot AI: Kimi K2

Missed the last one?

🗣️ Your Photo Can Talk, 🧬 Cancer Drugs by AI, ⚡ Musk Imports Power Plant

Rico

Jul 9

🗣️ Your Photo Can Talk, 🧬 Cancer Drugs by AI, ⚡ Musk Imports Power Plant

🎵 Podcast

Read full story

This Week in AI

🗣️ Your Photo Can Talk, 🧬 Cancer Drugs by AI, ⚡ Musk Imports Power Plant

Discussion about this post