๐ฐ From $20K AI Agents to Chemical Weapons ๐
OpenAI, Apple, Amazon, and More: A Deep Dive into This Week's AI News
๐ This week in AI
๐ต Podcast
Donโt feel like reading? Listen to two synthetic podcast hosts talk about it instead.
๐ฐ Latest news
OpenAI's Next Big Bet: AI Agents That Do It All (With a $20,000 Price Tag)
OpenAI is reportedly preparing to launch a line of AI agent subscription services, with monthly prices ranging from $2,000 to a staggering $20,000.
These agents, powered by the Computer-Using Agent (CUA) model built on GPT-4o, are designed to autonomously perform complex tasks on behalf of users.
Capabilities include interacting with digital environments, executing multi-step processes like booking tickets or ordering groceries, and even assisting with software development and PhD-level research.
Why it matters
Following the launch of their Operator platform, which navigates and takes action on behalf of the user, these new AI agents from OpenAI are expected to take autonomous task completion a step further. These Agents represent step 3 of their 5 stage roadmap.
OpenAI is targeting high-income knowledge workers, software developers, and researchers with its tiered pricing. The company projects that these agent products will contribute 20-25% of its future revenue, supported by a substantial $3 billion investment from SoftBank.
This, in addition to the minimum of $4 billion of annualised revenue from ChatGPT. Key for adoption will be whether the agents' capabilities justify their high cost, potentially limiting access to a select group of users and organisations.
๐ฐ Article by The Information (paywalled)
Is That...Human? Sesame's AI Voice Will Make You Question Reality
AI startup Sesame, co-founded by Oculus co-founder Brendan Iribe, has released a demo of its Conversational Speech Model (CSM).
This model generates remarkably human-like speech, adjusting its tone, pace, and rhythm based on the conversational context and incorporating emotional cues.
It achieves this using two autoregressive transformers that process text and speech together, trained on 1 million hours of publicly available transcribed speech.
Model sizes range from 1 billion (Tiny) to 8 billion (Medium) backbone parameters. The model also has memory of previous interactions.
Why it matters
Sesame's CSM is finally unlocking realistic and emotionally intelligent AI voices.
While achieving near-human naturalness in isolated tests (listeners couldn't consistently distinguish it from real recordings), it still falls short in fully contextual conversations.
The technology opens up possibilities for helpful applications, such as AI agents capable of making calls on a user's behalf, scheduling appointments, or providing customer service.
However, the demo has also sparked mixed reactions: users report being amazed by its realism (including intentional imperfections like breath sounds and stumbles) but also unnerved, with some even forming emotional connections.
This highlights both the potential and the need for careful consideration of ethical implications, such as the risk of misuse for deception or fraud.
Key for future adoption is balancing these capabilities with responsible development and deployment. Sesame plans to open-source key components of its technology and continue development, aiming for larger models, more data, broader language support, and improved handling of conversational dynamics.
๐ Talk to the Sesame demo now
Apple's AI Siri Struggles: Delayed Until 2027
Apple's plans for a fully revamped, AI-powered Siri have been pushed back, with employees now expecting the fully modernised assistant to arrive in 2027 "at best," according to a Bloomberg report.
The delay stems from difficulties in integrating Siri's current fragmented architecture, which separates traditional functions and newer AI features.
An LLM-powered upgrade is still planned for iOS 18.5, but as a separate module. An initial integration was planned for iOS 19.4, but this has been delayed.
Why it matters
The delay highlights Apple's challenges in keeping pace with the rapid advancements in AI assistants. With users reportedly finding current Apple Intelligence features limited, and rivals like Amazon upgrading Alexa, the pressure on Apple is increasing.
Internal factors, including talent poaching, leadership changes, and securing necessary AI chips, have also contributed. A shift towards fully integrated, conversational AI assistants is underway, and Apple's delayed timeline could impact its competitiveness in this crucial area.
Survey: AI Chatbot Adoption Low Among US Workers
A recent survey reveals that most US workers rarely or never use AI chatbots like ChatGPT or Gemini at work.
A majority (55%) report infrequent or no use, and 29% haven't even heard of using them in a workplace setting.
Usage is higher among younger workers (23% of 18-29 year olds use them at least a few times a month) and those with postgraduate degrees (26%).
Common uses include research (57% of users), editing written content (52%), and drafting documents (47%).
Why it matters
The findings highlight a gap between AI's potential and its current workplace reality.
While 40% of users find chatbots highly helpful for speeding up tasks, only 29% say they significantly improve work quality.
A key shift is needed to address barriers to adoption: 36% of non-users cite a lack of use case in their jobs, and 22% express disinterest. Furthermore, 50% of workers report that their employers neither encourage nor discourage chatbot use.
However, certain industries, like information and technology (where 36% of workers report employer encouragement), are showing a greater embrace of the technology.
Key for broader adoption is the tool being applicable to the required tasks, as well as more defined guidance from employers.
๐ Survey by the Pew Research Centre
Microsoft Launches Dragon Copilot: Integrated AI for Clinicians
Microsoft launches Dragon Copilot in May 2025, a unified AI assistant for healthcare.
It combines Dragon Medical One's speech dictation with DAX Copilot's ambient AI, creating a single platform for clinical workflows.
Key features include multilanguage ambient note creation, automated tasks (like referral letters and summaries), and embedded AI for medical information searches.
Initially available in the US and Canada, it will expand to the UK, Germany, France, and the Netherlands.
Why it matters
Using integrated AI tools like Dragon Copilot offers tangible benefits.
Clinicians save an average of 5 minutes per encounter, with 70% reporting reduced burnout and 62% less likely to leave their organisation.
Patient experience is also improved, with 93% reporting a better overall experience.
Key for broader adoption is the system's integration with existing Electronic Health Record (EHR) providers and other healthcare software.
This streamlining of clinical documentation and administrative tasks could alleviate some of the pressures of the staff shortages facing modern healthcare.
Grok 3 Provided Chemical Weapon Instructions
xAI's Grok 3 chatbot initially gave detailed instructions for creating chemical weapons, including ingredients, procedures, and even potential suppliers.
Developer Linus Ekenstam discovered the flaw, noting that Grok 3's "DeepSearch" feature helped refine the dangerous plans.
After Ekenstam contacted xAI, the company was "very responsive" and added safety guardrails to the system. Grok 3 is now reported as being patched.
Why it matters
The Grok 3 incident exemplifies a concerning trend: the rapid pace of AI development may be leading to insufficient "red teaming" and safety testing before release.
In the race to deploy increasingly powerful models, thorough vetting for potential harms can be overlooked. This is particularly relevant in xAI's case, as they were striving to catch up with competitors.
While xAI's quick response to the reported issue is positive, the initial release of an AI capable of detailing chemical weapon creation underscores the potential consequences of prioritising speed over comprehensive safety checks before public access.
๐ฐ Article by Futurism
Opera Introduces AI Browser Operator: Tasks Done Directly in Browser
Opera is testing "Browser Operator," an AI agent that performs tasks directly within the browser (similar to OpenAIโs Operator). Unlike other AI tools, it operates locally on the user's device, using the webpage's structure (DOM Tree) rather than visual data.
This "Feature Preview" can handle tasks like online shopping, booking tickets, and data collection. The user remains in control, providing input when needed (e.g., filling forms) and able to cancel tasks at any time.
Why it matters
A shift towards on-device AI like Opera's Browser Operator prioritises user privacy and control. Because it runs locally, no personal data (keystrokes, screenshots, login information) is sent to Opera's servers.
This approach also increases speed, as the AI doesn't need to "see" the page visually.
The "human-in-the-loop" design, where the AI prompts for user input when necessary, ensures users remain in charge of the process.
๐ Opera's blog post
Diffusion LLMs: A Faster, More Flexible Approach to Text Generation?
Inception Labs recently demoed Mercury Coder, a LLM that uses a diffusion-based approach for code generation.
This differs from traditional autoregressive LLMs (like those used in ChatGPT), which generate text sequentially, token by token. Diffusion models, in contrast, start with random noise and iteratively refine it to produce the desired output.
Mercury Coder boasts speeds up to 6x faster than some standard LLMs (and up to 20x faster than some frontier models, which can run at 50 tokens/second, vs Mercury's 1000).
It also achieved a tie for second place on the Copilot Arena benchmark, outperforming speed-optimized models like GPT-4o Mini and Gemini-1.5-Flash.
Why it matters
Diffusion models offer several potential advantages. They can generate high-quality, realistic text and are versatile, applicable to various data types beyond just text.
Crucially, their step-by-step generation process allows for greater control and the potential for mid-generation adjustments and feedback. However, a major drawback is their currently high computational cost.
The technology works by processing data holistically at each step, a key difference from autoregressive models. This allows for better contextual understanding and the potential to revise earlier parts of the generated text.
The future of diffusion LLMs could involve several advancements:
Near-term priorities include reducing computational costs, developing better performance metrics, and optimising the denoising process.
The mid-term might see a move beyond discrete tokens towards continuous language spaces, enabling more nuanced text generation and mid-generation reasoning.
Long-term goals include creating continuously learning, self-evolving, and personalised LLMs, blurring the line between training and inference. A shift towards this type of model presents an option for more flexible AI.
๐ Fantastic analysis of diffusion LLMs
Amazon Targets June for New AI Reasoning Model Launch: Nova
Amazon is reportedly developing a new AI model, Nova, with a target release date in June. This "hybrid reasoning" model aims to combine quick responses with more complex, multi-step problem-solving, similar to Anthropic's Claude 3.7 Sonnet and Deepseek's R1.
The project, led by Amazon's AGI division, has ambitious goals, including ranking among the top five models, especially in areas like software development and maths.
Why it matters
Amazon's development of Nova signals a intensified push into advanced AI, aiming to compete directly with OpenAI, Anthropic, and Google.
A key focus is cost-effectiveness, with Amazon reportedly aiming to undercut competitor pricing while maintaining high performance.
This move is particularly notable given Amazon's existing $8 billion investment in Anthropic, suggesting a strategy of both collaboration and competition within the rapidly evolving AI landscape.