Welcome to This Week in AI
Two major events have captivated us this week: Google’s I/O event and OpenAI’s new GTP-4o model.
Let’s get into it!
"Al is like discovering a new continent with 100 billion people who will work for free (or a few watts)" - and by the way, be much better than your existing human workforce” - @natfriedman
First Up - Google I/O 2024
This is Google's premier tech showcase; it's very telling that it was almost entirely an AI-focused event.
Watch the recap here.
🔗 TechCrunch mocked the omnipresent AI references
Project Astra
Project Astra is a multimodal AI assistant that can understand and respond to various inputs, like text, speech, images, and videos.
Google's vision for AI is shifting towards "agents", AI bots that can not only converse but also act on your behalf. From simple task-oriented tools to more collaborative companions.
Astra is considered a significant step towards this vision, offering a more natural and intuitive interaction with AI.
Must watch ↑↑↑
Google Search Enhancements
Gemini integration will enable faster answers, advanced planning capabilities (like trip itineraries), and the ability to use video for problem-solving.
This is an interesting one as Search is Google’s core business and cash cow, until now it has refrained from touching it, but companies like OpenAI and Perplexity are starting to eat into its market share.
Enhanced Google Workspace Integration
Gemini will be integrated into Workspace apps like Gmail, enabling features like email summarisation, smart replies, and receipt organisation.
AI Advancements in Android
Wider rollout of Circle to Search and the introduction of TalkBack, an AI-powered accessibility tool for visually impaired users. Also included is scam detection across communication channels.
New Generative AI Tools: VEO and Imagen 3
Google has introduced Veo, its most powerful video generation model, and Imagen 3, its highest quality text-to-image model. These models offer impressive levels of creative control and realism.
Gemini 1.5 Pro and Flash
Google has introduced a suite of upgrades to Gemini, its AI assistant, headlined by the incorporation of Gemini 1.5 Pro into Gemini Advanced.
This integration brings a range of enhancements, including the largest context window of any consumer chatbot (1 million tokens), facilitating interaction with extensive documents and data sets.
Another feature is "Gems", customisable versions of Gemini (similar to OpenAI’s GPTs).
Loop Daddy AKA Marc Rebillet Pre-Show
This years I/O event was kicked off by Marc Rebillet, he’s well known for his improvised electronic (often unfiltered) loops, he isn’t exactly who most developers expected to open the event.
Next Up - OpenAI’s New GPT-4o Model
This is a watershed moment for AI, free access to GPT-4o for all ChatGPT users. In addition to that, the model is much faster and half the cost (API).
The demos below are genuinely incredible, the latency in response time with speech makes interacting with it seamless.
🔗 OpenAI's Spring Announcement
🎬 Examples on OpenAI’s YouTube channel
Multimodal Capabilities
GPT-4o is OpenAI's newest flagship model capable of processing and understanding text, audio, images, and video in real time.
It can respond to audio inputs with near-human speed and generate diverse outputs including text, audio, and images.
Enhanced Performance and Accessibility
GPT-4o matches GPT-4 Turbo's text and code performance in English while significantly improving performance for non-English languages.
Applications and Features
OpenAI is rolling out GPT-4o's capabilities iteratively, starting with text and image features in ChatGPT.
A new Voice Mode with GPT-4o's audio and video capabilities is expected to be available soon, enhancing the naturalness of voice conversations.
New ChatGPT Desktop App
A new ChatGPT desktop app for macOS is being launched, allowing users to seamlessly integrate ChatGPT into their workflow.
The app supports voice conversations and features the ability to discuss screenshots directly. A Windows version is expected to be released later in the year.
Expanded Language Support and Access
ChatGPT now supports over 50 languages for sign-up, login, and user settings. GPT-4o is being rolled out to ChatGPT Plus and Team users, with availability for Enterprise users coming soon.
Free ChatGPT users will also have access to GPT-4o with usage limits.