🍿AI highlights from this week (2/24/23)
Theory of Mind tests, Meta releases a new LLM, Spotify launches an AI DJ, OpenAI partners with Bain and more…
Hi readers,
Welcome to all the new subscribers! Here are my highlights from the last week in AI in which GPT passed more Theory of Mind tests, Meta released a new “open-sourced” language model and Spotify launches an AI DJ!
P.S. Don’t forget to hit subscribe if you’re new to AI and want to learn more about the space.
The Best
1/ GPT and Theory of Mind
Over the last few weeks, I’ve been following research being published about GPT3.5’s ability to pass Theory of Mind tests. In the most recent example, GPT was able to pass a Faux Pas Recognition test, understanding subtle interactions between two fictional characters in a role play:
But what exactly is Theory of Mind and why does it matter? Here is Wikipedia’s definition of Theory of Mind:
In psychology, theory of mind refers to the capacity to understand other people by ascribing mental statesto them (that is, surmising what is happening in their mind). This includes the knowledge that others' mental states may be different from one's own states and include beliefs, desires, intentions, emotions, and thoughts. Possessing a functional theory of mind is considered crucial for success in everyday human social interactions. People use such a theory when analyzing, judging, and inferring others' behaviors.
That last bit is the important reason why we care about Theory of Mind in AI: The ability to interact with other humans in social interactions and crucially, inferring other’s behavior is a critical part of creating an AI that can effectively interact with humans.
It seems then that although GPT-3.5 wasn’t trained specifically to be good at Theory of Mind, it can interpret nuances in conversation when prompted. What’s even more impressive is that we’ve seen from tests of Bing’s AI, which is also powered by GPT, but optimized for conversations, that it can reason about its answers when solving these “Theory of Mind” questions:
When models can perform in unexpected ways, we call it emergent behavior. What’s exciting about these findings is that we might be underestimating the capabilities of large-scale language models. For example, a research team at Meta recently showed that they could train a language model to use external APIs to complete tasks that require mathematical calculations, something LLMs are known for being bad at:
All this suggests there will likely be more emergent behavior in language models that we are yet to discover!
2/ Meta enters the race to build the best large-scale language model
Meta’s AI Research lab announced this week that they are “releasing” a set of large-scale language models called LLaMA that they claim outperforms OpenAI’s GPT 3 and Google’s PaLM1 and Chinchilla models:
Meta’s researchers highlighted that LLaMA outperforms state-of-the-art models despite only being trained on publicly available data, unlike GPT and PaLM, which presumably are trained on private datasets too:
Since the models were announced just a few days ago, however, there has been a lot of debate in the AI community about whether LLaMA is truly “open-source”. The model weights are only available upon request, and it is apparently not licensed for commercial use. On the other hand, the code used to build the model is licensed for commercial use. This distinction is critical because training this model required thousands of GPUs and likely cost millions of dollars, making it difficult for the average startup to reproduce the same model just with the code.
So why is Facebook calling LLaMA open-source? My guess is that it is to intentionally position Meta as a competitor in the AI war. Zuck’s message to the AI community is “Don’t sleep on Meta.”
3/ Spotify launches personalized AI powered music DJ
As I’ve said before, an area I’m really excited about is how AI will impact music creation and the music industry. Related to this, I’m also interested in how our consumption of music will change as AI becomes a part of the experience and that’s exactly what Spotify is experimenting with their new personalized DJ feature:
I’m very excited to try this new feature out but it isn’t available on my account yet. Behind the scenes, Spotify uses a combination of their existing recommendations algorithms, generative AI to generate scripts for the DJ, and an AI text-to-voice solution to synthesize the DJ’s voice.
I predict that this combination of generative personalized content combined with text-to-voice synthesis will be a growing area of innovation. Here’s another example of a company that created AI-generated podcasts of completely fictional interviews between famous people:
I found the Steve Jobs one to be really convincing, and about 5 minutes into the interview, I completely forgot that I wasn’t listening to the real Steve Jobs. Check out the episode below, and let me know what you think!
4/ OpenAI partners with Bain & Company to provide enterprise AI solutions
Last week OpenAI announced that they are partnering with the consulting firm Bain to help their clients make better use of Generative AI:
This announcement is important for a few obvious reasons:
- Bain is one of the top consulting firms in the world, so putting their brand behind OpenAI is a strong signal to the broader market that Generative AI should be taken seriously as a technology and is not just a toy.
- New technology adoption in Enterprise often takes time - in the order of many years, but with this partnership and announcing Coca-cola as the first client, it’s possible that AI will be adopted by Enterprise much faster!
- OpenAI partnering with Bain shows that they are not just a foundational model provider and want to expand into the application layer, even though they aren’t building the solutions themselves. This, combined with OpenAI’s partnership with Mircrosoft should make B2B AI startups building on OpenAI’s platform weary of their long-term ability to build a moat!
Another less obvious reason that this partnership is likely to be valuable to OpenAI is access to private datasets. I wouldn’t be surprised if OpenAI is doing this partnership with Bain to build relationships with enterprise customers whose datasets they can then acquire to train the next version of GPT. This may end up being the most strategically important reason of all, given the competition heating around building AI models and the need for OpenAI to differentiate their model from Google’s or open-source alternatives.
Smart move, OpenAI!
5/ Presidents’ Day gets the AI treatment
To celebrate Presidents’ Day on Monday the AI community shared a number of fun AI projects that showcased the creative things you can do with AI today:
Ammaar Reshi who I recently interviewed on my podcast created an interview with Ronald Reagan:
Then Kevin Fischer who I also recently interviewed created a chatbot for every president:
And finally, Dan Szymborski re-imagined all of our past presidents as Pixar characters!
What did y’all think of these AI creations? Let me know in the comments!
The Rest…
A few other updates in the world of AI from this week:
Research shows that while ChatGPT is generally capable of solving a diverse set of NLP2 but is not particularly good at any of them compared to specialized state-of-the-art models:
You can ask questions to your favorite book by embedding3 it into GPT3:
Some useful tips for making ChatGPT better at answering your questions:
Turn drawings into art with these two fun tools:
And that’s a wrap for this week folks!
Thanks for reading The Hitchhikers Guide to AI! Subscribe for free to receive new posts and support my work.
To learn more about Google’s PaLM model and see how it did in a side-by-side test I did against ChatGPT, check out this article I recently wrote: ↩
NLP is a machine learning technology that gives computers the ability to interpret, manipulate, and comprehend human language , and is used to reduce cumbersome manual processes, address citizen demands for transparency and responsiveness, solve workforce challenges, and unleash new insights from data. ↩
In machine learning, an embedding is a way to represent an object or feature as a vector of numbers that captures its essential characteristics. This vector representation is often lower-dimensional than the original feature space and is designed to capture semantic relationships between features.
For example, in natural language processing, word embeddings are commonly used to represent words as vectors of numbers. These embeddings are learned from large text corpora and are designed to capture semantic relationships between words. For instance, similar words, such as "dog" and "cat," are represented by vectors that are close together in the embedding space, while dissimilar words, such as "dog" and "car," are represented by vectors that are far apart.
Similarly, in image processing, feature embeddings can be used to represent images as vectors of numbers. These embeddings are learned from large collections of images and are designed to capture visual relationships between images.
Overall, embeddings are a powerful way to represent complex objects or features in a way that can be easily used by machine learning algorithms. ↩