One AI equivalent to 700 people, GPT-4 dethroned? And OpenAI strikes back at Elon
Weekly recap of what's happened in AI
Welcome to the biggest headlines in AI this week.
I scroll so you don’t have to 👇
Klarna’s AI assistant doing the work equivalent of 700 full-time agents
Swedish BNPL Fintech firm Klarna has announced the stunning performance of their AI assistant which had been running for one month. Powered by OpenAI’s GPT-4 model, their AI assistant handled 2.3 million conversations, roughly two thirds of their chat conversations.
Klarna claims it is on par with human agents for customer satisfaction and resolves errands on average 9 minutes faster than before, in fact they say it is the equivalent of 700 full-time agents, and in turn will drive $40 million USD in profits.
I created this infographic to help visualise the numbers.
Click the image for full resolution
Klarna’s news comes off the back of a turbulent four years, in 2020 they laid off 10% of their global workforce, with the CEO citing “different macro and geopolitical factors” as the reason, the year prior Klarna raised $800 million at a $6.7 billion valuation which was led by Sequoia Capital (down from its $45.6 billion valuation in June 2021).
All this to say, this news is well timed as Klarna seeks to go public, the IPO markets have been virtually dead for some time with green shoots only starting to appear now.
While Klarna’s tides are rising, others are being negatively impacted by the news. Teleperformance is a company which provides customer services and has 410,000 employees (as of 2022), Since Klarna’s news its share price tanked more than 20%. Investors are getting scared as they see the impact from AI on call centres.
Market summary by Google - 7th March 2024
Already, AI models from 11Labs, can verbally communicate in any language with human intonation and inflections, almost indistinguishable from a human service agent. This will inevitably drive a decrease in costs as AI replaces human labour, and deliver an experience that is localised (accents and language), and is available 24/7. The disruption to companies like Teleperformance is only beginning as these models get better.
Take Google’s Soundstorm model for example, with only a 3 second sample, it is able to realistically synthesise your voice with astonishing results.
In fact, Vishing Scams (a form of phishing that involves voice communication) has significantly increased in the last few years, tales of “kidnapped children” calling their parents have emerged, and while it is terrifying, it’s important to remember that as technology evolves, so do the counter measures to combat these issues.
The king is dead, Anthropic’s Claude 3 Opus model reigns over GPT-4 (sort of)
Anthropic, a startup who is making waves with the release of Claude 3 Opus which is the first model to challenge the throne of OpenAI’s GPT-4.
Many speculated that Opus had achieved artificial general intelligence (AGI), with the AI showing remarkable self awareness and abilities. One test researches do is to test the model’s recall ability by inserting a target sentence (the "needle") into a corpus of random documents (the "haystack") and asking a question that could only be answered using the information in the needle.
They asked Opus to answer a question about pizza toppings (the “needle”) and this was its reply:
"The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association." However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.
This level of meta-awareness is quite astounding, but no, the Opus model is not sentient nor has it achieved AGI. The phenomena we’re seeing is that when models get to a GPT-4 class level, they exhibit a surface level of consciousness, whether this is actual consciousness or not is still up for debate, what we do know is that the more compute we throw at training the models, the phenomena is stronger.
What all this means
Claude 3 Opus benchmarks close to GPT-4 but the humanity of it is what led people to believe it had beaten the reigning champion.
What is important is that the medium size model Sonnet and smaller sized model Haiku punch well above their size classes, and for cheaper.
The bulk of AI applications either run on GPT-4 or GPT-3.5 turbo, with new contenders coming in with LLMs that are cheaper, we’re starting to see the market share diversify.
Or, check out this analysis Comparing Claude 3 to existing models
Hugging Face leader board (most popular)
Musk's OpenAI Claims Weakened by Email Revelation
The recent spat between Elon Musk and OpenAI’s CEO Sam Altman is escalating, after Elon launched a lawsuit claiming that OpenAI has deviated from from the original “mission to ensure that artificial general intelligence benefits all of humanity”, OpenAI responded in a blog post which shows a series of emails showing that Elon had in fact agreed with the path OpenAI has taken.
The post prefaces the emails by saying:
We're sad that it's come to this with someone whom we’ve deeply admired—someone who inspired us to aim higher, then told us we would fail, started a competitor, and then sued us when we started making meaningful progress towards OpenAI’s mission without him.
The emails discuss how to develop AI safely, and that as they get closer to building AI it makes sense to “start being less open” to ensure the technology doesn’t fall into the wrong hands.
Elon, always the antagonist, regularly posts memes about OpenAI, poking at the fact that their technology is close source.
Read more about the lawsuit here.
Snowflake + Mistral AI: Expanding AI Capabilities for Enterprises
Snowflake is partnering with Mistral AI to provide industry-leading language models within the Snowflake Data Cloud. Mistral's powerful models, including their flagship Mistral Large, are now accessible to Snowflake clients, granting them new capabilities for harnessing generative AI. Additionally, Snowflake Ventures' investment signals a strategic move to fuel generative AI innovation.
Key points
Snowflake Cortex LLM Functions (in public preview) empower users to easily build AI applications. This simplifies processes like sentiment analysis and translation, even for those primarily familiar with SQL.
Developers can leverage the power of foundation LLMs to rapidly construct sophisticated AI solutions, such as chatbots.
Snowflake's commitment to secure and governed data practices remains consistent, ensuring data protection.
OpenAI’s Sora Creates Incredible Fly Through of Generated Gallery
In the not so distant past, text-to-video models couldn’t coherently string frames together without the visuals morphing frame to frame, a new class of diffusion transformers are making genAI videos usable assets in video production.
Humanoid robots startup Figure Secures US$2.6B Valuation
Figure's substantial Series B funding signifies a bullish outlook on the integration of AI with humanoid robotics. The Bay Area startup aims to create robots that leverage AI for enhanced capabilities.
Key points
OpenAI's partnership with Figure focuses on developing advanced language models. This will enable the robots to understand and respond to human commands intuitively, facilitating seamless collaboration in industrial settings.
AI will empower Figure's robots to interpret complex instructions, analyze their environment, and adapt their actions accordingly. This adaptability is crucial for navigating the unpredictable nature of real-world workspaces.
While true AGI remains a long-term goal, the pursuit of general-purpose humanoid robots intrinsically depends on sophisticated AI. Figure's progress and partnerships with AI leaders suggest advancements in this direction.
That’s a wrap!
Reply to this email to tell me what you think and feel free to share it.
Another great summary of the week in AI…