Google vs OpenAI: The AI Announcement Showdown 📢

Your weekly AI wrap

May 16, 2024

Welcome to This Week in AI

Two major events have captivated us this week: Google’s I/O event and OpenAI’s new GTP-4o model.

Let’s get into it!

"Al is like discovering a new continent with 100 billion people who will work for free (or a few watts)" - and by the way, be much better than your existing human workforce” - @natfriedman

First Up - Google I/O 2024

This is Google's premier tech showcase; it's very telling that it was almost entirely an AI-focused event.

Watch the recap here.

🔗 TechCrunch mocked the omnipresent AI references

Project Astra

Project Astra is a multimodal AI assistant that can understand and respond to various inputs, like text, speech, images, and videos.

Google's vision for AI is shifting towards "agents", AI bots that can not only converse but also act on your behalf. From simple task-oriented tools to more collaborative companions.

Astra is considered a significant step towards this vision, offering a more natural and intuitive interaction with AI.

🔗 Project Astra

Must watch ↑↑↑

Google Search Enhancements

Gemini integration will enable faster answers, advanced planning capabilities (like trip itineraries), and the ability to use video for problem-solving.

This is an interesting one as Search is Google’s core business and cash cow, until now it has refrained from touching it, but companies like OpenAI and Perplexity are starting to eat into its market share.

🔗 Generative AI in search

Enhanced Google Workspace Integration

Gemini will be integrated into Workspace apps like Gmail, enabling features like email summarisation, smart replies, and receipt organisation.

🔗 See Workspace features

AI Advancements in Android

Wider rollout of Circle to Search and the introduction of TalkBack, an AI-powered accessibility tool for visually impaired users. Also included is scam detection across communication channels.

🔗 More features here

New Generative AI Tools: VEO and Imagen 3

Google has introduced Veo, its most powerful video generation model, and Imagen 3, its highest quality text-to-image model. These models offer impressive levels of creative control and realism.

🔗 See more here

Gemini 1.5 Pro and Flash

Google has introduced a suite of upgrades to Gemini, its AI assistant, headlined by the incorporation of Gemini 1.5 Pro into Gemini Advanced.

This integration brings a range of enhancements, including the largest context window of any consumer chatbot (1 million tokens), facilitating interaction with extensive documents and data sets.

Another feature is "Gems", customisable versions of Gemini (similar to OpenAI’s GPTs).

🔗 Read up on them here

Loop Daddy AKA Marc Rebillet Pre-Show

This years I/O event was kicked off by Marc Rebillet, he’s well known for his improvised electronic (often unfiltered) loops, he isn’t exactly who most developers expected to open the event.

Next Up - OpenAI’s New GPT-4o Model

This is a watershed moment for AI, free access to GPT-4o for all ChatGPT users. In addition to that, the model is much faster and half the cost (API).

The demos below are genuinely incredible, the latency in response time with speech makes interacting with it seamless.

🔗 OpenAI's Spring Announcement

🎬 Examples on OpenAI’s YouTube channel

Multimodal Capabilities

GPT-4o is OpenAI's newest flagship model capable of processing and understanding text, audio, images, and video in real time.

It can respond to audio inputs with near-human speed and generate diverse outputs including text, audio, and images.

Enhanced Performance and Accessibility

GPT-4o matches GPT-4 Turbo's text and code performance in English while significantly improving performance for non-English languages.

Applications and Features

OpenAI is rolling out GPT-4o's capabilities iteratively, starting with text and image features in ChatGPT.

A new Voice Mode with GPT-4o's audio and video capabilities is expected to be available soon, enhancing the naturalness of voice conversations.

New ChatGPT Desktop App

A new ChatGPT desktop app for macOS is being launched, allowing users to seamlessly integrate ChatGPT into their workflow.

The app supports voice conversations and features the ability to discuss screenshots directly. A Windows version is expected to be released later in the year.

Expanded Language Support and Access

ChatGPT now supports over 50 languages for sign-up, login, and user settings. GPT-4o is being rolled out to ChatGPT Plus and Team users, with availability for Enterprise users coming soon.

Free ChatGPT users will also have access to GPT-4o with usage limits.

Love This Week in AI? I’d appreciate a share.

This Week in AI

Discussion about this post

Ready for more?