Claude 3.7 Dominates 💪 Shifting AI Economics 💰 Grok 3 Hypocrisy? 🤔
PLUS: A new humanoid roomate, Claude plays Pokemon and meet the new AI-powered Alexa+
👋 This week in AI
🎵 Podcast
Don’t feel like reading? Listen to two synthetic podcast hosts talk about it instead.
📰 Latest news
Anthropic Levels Up Claude: Hybrid Reasoning, Code Agent, and Pokemon
Anthropic has released Claude 3.7 Sonnet, the latest version of its AI model, it is the preferred choice for developers for coding tasks.
This update further enhances its coding capabilities, achieving 70.3% on the SWE-bench Verified benchmark with scaffolding (63.7% without) and 81.2% on TAU-bench.
The model introduces "hybrid reasoning," letting users switch between fast responses and a deeper "extended thinking" mode that shows the AI's reasoning.
Users also control how long the model "thinks" (up to 128K tokens) via the API.
Alongside the model, Anthropic launched Claude Code, a command-line coding agent (limited preview).
Why it Matters
Claude 3.7 Sonnet solidifies Claude's position as a leading AI for developers.
The hybrid reasoning and the Claude Code tool provide more control and transparency, addressing key needs in software development.
The model's ability to balance speed and in-depth analysis is key for practical application in complex coding scenarios. The 45% reduction in unnessecary refusals also helps.
Claude Plays Pokemon
Anthropic demonstrated Claude 3.7 Sonnet playing Pokemon Red on Twitch. This showcased the model's reasoning abilities, as it navigated the game, solved puzzles (with some struggles), and even displayed its "thought process."
While sometimes slow and imperfect, the demonstration highlighted both the progress and current limitations of AI in handling tasks that are relatively simple for humans.
OpenAI's SoftBank Pivot: Why Microsoft Took a Back Seat
OpenAI will shift 75% of its compute to SoftBank's Stargate project by 2030, reducing reliance on Microsoft. This is powered by SoftBank's $40 billion investment (part of a $260 billion valuation), with Stargate aiming for 8GW capacity.
OpenAI's Microsoft data centre spending will still reach $28bn by 2028.
Why it Matters
This shift underscores the massive and growing costs of advanced AI.
OpenAI's projected $20 billion cash burn by 2027, with inference costs exceeding training costs by 2030, signals a changing economic landscape. While Microsoft had already invested over $13 billion since 2019, SoftBank's more aggressive $40 billion commitment, aligned with CEO Masayoshi Son's high-risk, high-reward investment style, better suited OpenAI's escalating needs.
Microsoft, facing its own $80 billion annual capital expenditures and emerging tensions with OpenAI over compute demands and governance, loosened its exclusive grip, allowing partnerships with others like Oracle and Softbank, while retaining key rights and revenue sharing.
SoftBank’s willingness to fund massive projects, like the potentially $500 billion Stargate, likely outstripped what Microsoft was prepared to commit, making Softbank a more fitting lead partner for this phase of OpenAI's growth.
This highlights a shift from sole-source funding to diversified partnerships as a key strategy for managing the extreme financial demands of cutting-edge AI.
Elon's 'Truth-Seeking' AI Censored: Grok 3 Hypocrisy?
xAI's Grok 3, the latest version of their AI model, was initially promoted as "maximally truth-seeking" and "anti-woke." However, users discovered that the model was briefly censoring information about Donald Trump and Elon Musk, specifically excluding sources linking them to misinformation in its system prompt.
xAI engineering lead, Igor Babuschkin, confirmed the censorship, attributed it to a former OpenAI employee, and stated that the change was quickly reverted after user reports.
Why it Matters
The brief censorship within Grok 3 directly contradicts Elon Musk's publicly stated commitment to free speech, a justification used for his acquisition of Twitter.
Given Musk's connections to the Trump administration and his contentious activity with DOGE, censoring information about these figures carries significant political implications.
The incident underlines a challenge: balancing the ideal of a "truth-seeking" AI with the potential for that AI to highlight uncomfortable truths about its creators or their associates.
Amazon's Alexa+: A More Conversational AI
Amazon's Alexa+ upgrades the Alexa platform, introducing a "model-agnostic system" that incorporates multiple large language models (LLMs).
A key innovation is the "experts" architecture: interconnected APIs, instructions, and capabilities designed for interactions across "tens of thousands" of services and devices.
This allows Alexa+ to act across platforms like smart home control (Philips Hue, Roborock) and service booking (OpenTable, Thumbtack), going beyond traditional voice assistant functionality. "Agentic capabilities" enable autonomous web navigation for task completion. Personalisation uses persistent memory of user preferences and data.
Deployment is cross-platform: Echo devices, a mobile app, and a web interface.
Why it Matters
Alexa+ demonstrates large-scale, real-world AI integration. The "experts" architecture and number of integrated services (with a potential user base of 600 million existing Alexa devices) are a considerable engineering feat.
The move to agentic capabilities aligns with AI research trends, pushing towards proactive task completion.
Free availability to Amazon Prime members suggests a strategy for rapid user adoption, potentially making complex AI interactions commonplace.
While long-term performance remains to be seen, Alexa+ offers a case study in practical, large-scale AI application, and a possible path for other AI services.
The "model-agnostic system" ensures that Amazon can seamlessly integrate and leverage the latest, best-performing models as they become available.
DeepSeek's Open Source Gambit: Open Source Week
Chinese AI startup DeepSeek AI has launched "Open Source Week," promising to release five pieces of their AI software code.
The first release is FlashMLA, a performance-boosting tool designed specifically for NVIDIA GPUs, useful for natural language processing and large language models. It achieves this by using BF16 precision and a system called paged KV cache.
This announcement comes amid growing international scrutiny of DeepSeek, a previously little-known company that briefly surpassed ChatGPT in popularity, triggering a nearly US 1 trillion US stockmarket drop and a US 1 trillion US stock market drop and a US 600 billion fall in Nvidia's market capitalisation.
Why it Matters
DeepSeek's rapid rise, and the subsequent restrictions imposed by countries like Italy, Canada, Australia, and Taiwan over data privacy and national security concerns, highlight a growing "AI Cold War." The situation reflects China's accelerating AI capabilities despite US efforts to limit its access to technology.
The company, built in two months for under US$6 million, reached 11.8 million visits in China in December 2024, with a 164% growth rate.
DeepSeek's open-source initiative, starting with FlashMLA (achieving up to 3000 GB/s memory bandwidth and 580 TFLOPS), can be seen in this geopolitical context: while facing international restrictions, the company is simultaneously promoting transparency and collaboration. This underscores a widening divide in the global AI landscape, forcing nations like Canada to consider their alliances and technological strategies.
📝 Watch for the new releases here
📰 Article by Policy Options on the geopolitical implications of Deepseek
📁 FlashMLA Github repo by Deepseek
NEO Gamma: Your Next Roommate Could Be a Robot
Norwegian robotics company 1X has launched NEO Gamma, a next-generation humanoid robot designed specifically for home environments.
NEO Gamma features a softer, more approachable appearance than many industrial robots, with "Emotive Ear Rings" for communication, soft covers for safety, and a knitted nylon exterior.
It incorporates an in-house LLM for natural conversation, supported by four microphones and a three-speaker system.
Key hardware improvements include a 10x increase in reliability and a 10 dB noise reduction, bringing operational sound down to the level of a standard refrigerator. The robot also possesses whole-body control running at 100Hz, a visual manipulation model to pick up objects.
Why it Matters
1X's emphasis on a non-intimidating design and enhanced communication capabilities, combined with significant improvements in reliability and noise reduction, are key for consumer adoption.
The company's focus on home testing, as highlighted by the CEO, underscores the importance of real-world environments for developing truly autonomous and useful household robots.
This approach could accelerate the integration of humanoids into homes, changing how people interact with technology and manage daily tasks.