Claude 3.7 Dominates 💪 Shifting AI Economics 💰 Grok 3 Hypocrisy? 🤔

PLUS: A new humanoid roomate, Claude plays Pokemon and meet the new AI-powered Alexa+

Rico

Feb 27, 2025

🎵 Podcast

Don’t feel like reading? Listen to two synthetic podcast hosts talk about it instead.

1×

0:00

-17:33

📰 Latest news

Anthropic Levels Up Claude: Hybrid Reasoning, Code Agent, and Pokemon

Anthropic has released Claude 3.7 Sonnet, the latest version of its AI model, it is the preferred choice for developers for coding tasks.

This update further enhances its coding capabilities, achieving 70.3% on the SWE-bench Verified benchmark with scaffolding (63.7% without) and 81.2% on TAU-bench.

The model introduces "hybrid reasoning," letting users switch between fast responses and a deeper "extended thinking" mode that shows the AI's reasoning.

Users also control how long the model "thinks" (up to 128K tokens) via the API.

Alongside the model, Anthropic launched Claude Code, a command-line coding agent (limited preview).

Why it Matters

Claude 3.7 Sonnet solidifies Claude's position as a leading AI for developers.

The hybrid reasoning and the Claude Code tool provide more control and transparency, addressing key needs in software development.

The model's ability to balance speed and in-depth analysis is key for practical application in complex coding scenarios. The 45% reduction in unnessecary refusals also helps.

Claude Plays Pokemon

Anthropic demonstrated Claude 3.7 Sonnet playing Pokemon Red on Twitch. This showcased the model's reasoning abilities, as it navigated the game, solved puzzles (with some struggles), and even displayed its "thought process."

While sometimes slow and imperfect, the demonstration highlighted both the progress and current limitations of AI in handling tasks that are relatively simple for humans.

🎥 Watch Claude play Pokemon

📝 Launch post by Anthropic

OpenAI's SoftBank Pivot: Why Microsoft Took a Back Seat

OpenAI will shift 75% of its compute to SoftBank's Stargate project by 2030, reducing reliance on Microsoft. This is powered by SoftBank's $40 billion investment (part of a $260 billion valuation), with Stargate aiming for 8GW capacity.

OpenAI's Microsoft data centre spending will still reach $28bn by 2028.

Why it Matters

This shift underscores the massive and growing costs of advanced AI.

OpenAI's projected $20 billion cash burn by 2027, with inference costs exceeding training costs by 2030, signals a changing economic landscape. While Microsoft had already invested over $13 billion since 2019, SoftBank's more aggressive $40 billion commitment, aligned with CEO Masayoshi Son's high-risk, high-reward investment style, better suited OpenAI's escalating needs.

Microsoft, facing its own $80 billion annual capital expenditures and emerging tensions with OpenAI over compute demands and governance, loosened its exclusive grip, allowing partnerships with others like Oracle and Softbank, while retaining key rights and revenue sharing.

SoftBank’s willingness to fund massive projects, like the potentially $500 billion Stargate, likely outstripped what Microsoft was prepared to commit, making Softbank a more fitting lead partner for this phase of OpenAI's growth.

This highlights a shift from sole-source funding to diversified partnerships as a key strategy for managing the extreme financial demands of cutting-edge AI.

📰 Article by Tech Crunch

𝕏 Article by Grok 3

Elon's 'Truth-Seeking' AI Censored: Grok 3 Hypocrisy?

In Grok’s own words: Elon Musk is a notable contender for biggest disinformation spreader on X

xAI's Grok 3, the latest version of their AI model, was initially promoted as "maximally truth-seeking" and "anti-woke." However, users discovered that the model was briefly censoring information about Donald Trump and Elon Musk, specifically excluding sources linking them to misinformation in its system prompt.

xAI engineering lead, Igor Babuschkin, confirmed the censorship, attributed it to a former OpenAI employee, and stated that the change was quickly reverted after user reports.

Why it Matters

The brief censorship within Grok 3 directly contradicts Elon Musk's publicly stated commitment to free speech, a justification used for his acquisition of Twitter.

Given Musk's connections to the Trump administration and his contentious activity with DOGE, censoring information about these figures carries significant political implications.

The incident underlines a challenge: balancing the ideal of a "truth-seeking" AI with the potential for that AI to highlight uncomfortable truths about its creators or their associates.

📰 Article by Tech Crunch

𝕏 Article created by Grok

Amazon's Alexa+: A More Conversational AI

Amazon's Alexa+ upgrades the Alexa platform, introducing a "model-agnostic system" that incorporates multiple large language models (LLMs).

A key innovation is the "experts" architecture: interconnected APIs, instructions, and capabilities designed for interactions across "tens of thousands" of services and devices.

This allows Alexa+ to act across platforms like smart home control (Philips Hue, Roborock) and service booking (OpenTable, Thumbtack), going beyond traditional voice assistant functionality. "Agentic capabilities" enable autonomous web navigation for task completion. Personalisation uses persistent memory of user preferences and data.

Deployment is cross-platform: Echo devices, a mobile app, and a web interface.

Why it Matters

Alexa+ demonstrates large-scale, real-world AI integration. The "experts" architecture and number of integrated services (with a potential user base of 600 million existing Alexa devices) are a considerable engineering feat.

The move to agentic capabilities aligns with AI research trends, pushing towards proactive task completion.

Free availability to Amazon Prime members suggests a strategy for rapid user adoption, potentially making complex AI interactions commonplace.

While long-term performance remains to be seen, Alexa+ offers a case study in practical, large-scale AI application, and a possible path for other AI services.

The "model-agnostic system" ensures that Amazon can seamlessly integrate and leverage the latest, best-performing models as they become available.

📝 Launch post by Amazon

DeepSeek's Open Source Gambit: Open Source Week

Chinese AI startup DeepSeek AI has launched "Open Source Week," promising to release five pieces of their AI software code.

The first release is FlashMLA, a performance-boosting tool designed specifically for NVIDIA GPUs, useful for natural language processing and large language models. It achieves this by using BF16 precision and a system called paged KV cache.

This announcement comes amid growing international scrutiny of DeepSeek, a previously little-known company that briefly surpassed ChatGPT in popularity, triggering a nearly US 1 trillion US stockmarket drop and a US 1 trillion US stock market drop and a US 600 billion fall in Nvidia's market capitalisation.

Why it Matters

DeepSeek's rapid rise, and the subsequent restrictions imposed by countries like Italy, Canada, Australia, and Taiwan over data privacy and national security concerns, highlight a growing "AI Cold War." The situation reflects China's accelerating AI capabilities despite US efforts to limit its access to technology.

The company, built in two months for under US$6 million, reached 11.8 million visits in China in December 2024, with a 164% growth rate.

DeepSeek's open-source initiative, starting with FlashMLA (achieving up to 3000 GB/s memory bandwidth and 580 TFLOPS), can be seen in this geopolitical context: while facing international restrictions, the company is simultaneously promoting transparency and collaboration. This underscores a widening divide in the global AI landscape, forcing nations like Canada to consider their alliances and technological strategies.

📝 Watch for the new releases here

📰 Article by Policy Options on the geopolitical implications of Deepseek

📁 FlashMLA Github repo by Deepseek

NEO Gamma: Your Next Roommate Could Be a Robot

Norwegian robotics company 1X has launched NEO Gamma, a next-generation humanoid robot designed specifically for home environments.

NEO Gamma features a softer, more approachable appearance than many industrial robots, with "Emotive Ear Rings" for communication, soft covers for safety, and a knitted nylon exterior.

It incorporates an in-house LLM for natural conversation, supported by four microphones and a three-speaker system.

Key hardware improvements include a 10x increase in reliability and a 10 dB noise reduction, bringing operational sound down to the level of a standard refrigerator. The robot also possesses whole-body control running at 100Hz, a visual manipulation model to pick up objects.

Why it Matters

1X's emphasis on a non-intimidating design and enhanced communication capabilities, combined with significant improvements in reliability and noise reduction, are key for consumer adoption.

The company's focus on home testing, as highlighted by the CEO, underscores the importance of real-world environments for developing truly autonomous and useful household robots.

This approach could accelerate the integration of humanoids into homes, changing how people interact with technology and manage daily tasks.

📝 Launch post by 1X