Parcha's Guide to AI

🍿AI highlights from this week (2/24/23)

Theory of Mind tests, Meta releases a new LLM, Spotify launches an AI DJ, OpenAI partners with Bain and more…

The Guide to AI by Parcha

26 Feb 2023 • 9 min read

Hi readers,

Welcome to all the new subscribers! Here are my highlights from the last week in AI in which GPT passed more Theory of Mind tests, Meta released a new “open-sourced” language model and Spotify launches an AI DJ!

P.S. Don’t forget to hit subscribe if you’re new to AI and want to learn more about the space.

Subscribe now

The Best

1/ GPT and Theory of Mind

Over the last few weeks, I’ve been following research being published about GPT3.5’s ability to pass Theory of Mind tests. In the most recent example, GPT was able to pass a Faux Pas Recognition test, understanding subtle interactions between two fictional characters in a role play:

GPT-3.5 passed yet another Theory of Mind test: @sbaroncohen's Faux Pas Recognition test (we use bespoke items to ascertain that it didn't see them before). It detected the faux pas (Q1-2); the broken social norm (Q3), the lack of intention (Q4) and its emotional impact (Q6)
— Michal Kosinski (@michalkosinski) 1:30 AM ∙ Feb 20, 2023

But what exactly is Theory of Mind and why does it matter? Here is Wikipedia’s definition of Theory of Mind:

In psychology, theory of mind refers to the capacity to understand other people by ascribing mental statesto them (that is, surmising what is happening in their mind). This includes the knowledge that others' mental states may be different from one's own states and include beliefs, desires, intentions, emotions, and thoughts. Possessing a functional theory of mind is considered crucial for success in everyday human social interactions. People use such a theory when analyzing, judging, and inferring others' behaviors.

That last bit is the important reason why we care about Theory of Mind in AI: The ability to interact with other humans in social interactions and crucially, inferring other’s behavior is a critical part of creating an AI that can effectively interact with humans.

It seems then that although GPT-3.5 wasn’t trained specifically to be good at Theory of Mind, it can interpret nuances in conversation when prompted. What’s even more impressive is that we’ve seen from tests of Bing’s AI, which is also powered by GPT, but optimized for conversations, that it can reason about its answers when solving these “Theory of Mind” questions:

This is a theory of mind puzzle I just tried from Gary Marcus's blog that ChatGPT consistently fails. And as I suspected, Bing's model is better at modeling this kind of stuff.

They still keep getting better. It's why I don't dismiss LLMs
— Deen Kun A. (@sir_deenicus) 11:58 PM ∙ Feb 17, 2023

When models can perform in unexpected ways, we call it emergent behavior. What’s exciting about these findings is that we might be underestimating the capabilities of large-scale language models. For example, a research team at Meta recently showed that they could train a language model to use external APIs to complete tasks that require mathematical calculations, something LLMs are known for being bad at:

Toolformer: Language Models Can Teach Themselves to Use Tools

introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction

abs: arxiv.org/abs/2302.04761
— AK (@_akhaliq) 1:39 AM ∙ Feb 10, 2023

All this suggests there will likely be more emergent behavior in language models that we are yet to discover!

2/ Meta enters the race to build the best large-scale language model

Meta’s AI Research lab announced this week that they are “releasing” a set of large-scale language models called LLaMA that they claim outperforms OpenAI’s GPT 3 and Google’s PaLM1 and Chinchilla models:

Today we release LLaMA, 4 foundation models ranging from 7B to 65B parameters.
LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B.
The weights for all models are open and available at research.facebook.com/publications/l…
1/n
— Guillaume Lample (@GuillaumeLample) 4:08 PM ∙ Feb 24, 2023

Meta’s researchers highlighted that LLaMA outperforms state-of-the-art models despite only being trained on publicly available data, unlike GPT and PaLM, which presumably are trained on private datasets too:

Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data which is either not publicly available or undocumented.
2/n
— Guillaume Lample (@GuillaumeLample) 4:08 PM ∙ Feb 24, 2023

Since the models were announced just a few days ago, however, there has been a lot of debate in the AI community about whether LLaMA is truly “open-source”. The model weights are only available upon request, and it is apparently not licensed for commercial use. On the other hand, the code used to build the model is licensed for commercial use. This distinction is critical because training this model required thousands of GPUs and likely cost millions of dollars, making it difficult for the average startup to reproduce the same model just with the code.

@GuillaumeLample "release"
— Rick Barber (@rbrbr5) 7:27 PM ∙ Feb 24, 2023

So why is Facebook calling LLaMA open-source? My guess is that it is to intentionally position Meta as a competitor in the AI war. Zuck’s message to the AI community is “Don’t sleep on Meta.”

3/ Spotify launches personalized AI powered music DJ

As I’ve said before, an area I’m really excited about is how AI will impact music creation and the music industry. Related to this, I’m also interested in how our consumption of music will change as AI becomes a part of the experience and that’s exactly what Spotify is experimenting with their new personalized DJ feature:

Introducing DJ ✨ An entirely new way to play your @Spotify, powered by AI. Now rolling out to Premium users in the U.S. and Canada. Head to Spotify now to meet your DJ.
— Spotify News (@SpotifyNews) 7:00 PM ∙ Feb 22, 2023

I’m very excited to try this new feature out but it isn’t available on my account yet. Behind the scenes, Spotify uses a combination of their existing recommendations algorithms, generative AI to generate scripts for the DJ, and an AI text-to-voice solution to synthesize the DJ’s voice.

I predict that this combination of generative personalized content combined with text-to-voice synthesis will be a growing area of innovation. Here’s another example of a company that created AI-generated podcasts of completely fictional interviews between famous people:

On the last episode of podcast.ai we asked if AI can make you laugh. We think it did!

This week we're asking if #AI can help you stay calm. In the episode below, AI #Oprah shares some #stressmanagement techniques she's learned during her life share.transistor.fm/s/deba2ffa… https://t.co/bQJeAfFMA9
— Play.ht (@play_ht) 9:31 PM ∙ Feb 21, 2023

I found the Steve Jobs one to be really convincing, and about 5 minutes into the interview, I completely forgot that I wasn’t listening to the real Steve Jobs. Check out the episode below, and let me know what you think!

4/ OpenAI partners with Bain & Company to provide enterprise AI solutions

Last week OpenAI announced that they are partnering with the consulting firm Bain to help their clients make better use of Generative AI:

Announcing our partnership with Bain, with Coca-Cola Company as the first mutual client:
— Greg Brockman (@gdb) 8:01 PM ∙ Feb 21, 2023

This announcement is important for a few obvious reasons:

Bain is one of the top consulting firms in the world, so putting their brand behind OpenAI is a strong signal to the broader market that Generative AI should be taken seriously as a technology and is not just a toy.
New technology adoption in Enterprise often takes time - in the order of many years, but with this partnership and announcing Coca-cola as the first client, it’s possible that AI will be adopted by Enterprise much faster!
OpenAI partnering with Bain shows that they are not just a foundational model provider and want to expand into the application layer, even though they aren’t building the solutions themselves. This, combined with OpenAI’s partnership with Mircrosoft should make B2B AI startups building on OpenAI’s platform weary of their long-term ability to build a moat!

Another less obvious reason that this partnership is likely to be valuable to OpenAI is access to private datasets. I wouldn’t be surprised if OpenAI is doing this partnership with Bain to build relationships with enterprise customers whose datasets they can then acquire to train the next version of GPT. This may end up being the most strategically important reason of all, given the competition heating around building AI models and the need for OpenAI to differentiate their model from Google’s or open-source alternatives.

Smart move, OpenAI!

5/ Presidents’ Day gets the AI treatment

To celebrate Presidents’ Day on Monday the AI community shared a number of fun AI projects that showcased the creative things you can do with AI today:

Ammaar Reshi who I recently interviewed on my podcast created an interview with Ronald Reagan:

If you could have dinner with anyone in the world, who would it be?

Well, I used ChatGPT, Bing Chat, ElevenLabs, and a couple of other AI tools to interview former President Ronald Reagan for Presidents’ Day.

Watch below and read for the process! 🧵
— Ammaar Reshi (@ammaar) 5:59 PM ∙ Feb 20, 2023

Then Kevin Fischer who I also recently interviewed created a chatbot for every president:

⚡ Presidents Day + AI website drop ⚡

Honor and learn about your presidents by talking to them with GPT!

We brought 20 of the most iconic presidents to life so you can have a totally new Presidents dAI journey.

Visit presidentsdai.com, chat now, and share what you learn
— Kevin Fischer (@KevinAFischer) 2:05 AM ∙ Feb 20, 2023

And finally, Dan Szymborski re-imagined all of our past presidents as Pixar characters!

My Saturday fun project: using AI, every US president as a Pixar character.
— Dan Szymborski (@DSzymborski) 8:14 PM ∙ Feb 18, 2023

What did y’all think of these AI creations? Let me know in the comments!

The Rest…

A few other updates in the world of AI from this week:

Research shows that while ChatGPT is generally capable of solving a diverse set of NLP2 but is not particularly good at any of them compared to specialized state-of-the-art models:
ChatGPT: Jack of all trades, master of none

abs: arxiv.org/abs/2302.10724
— AK (@_akhaliq) 2:22 AM ∙ Feb 22, 2023
You can ask questions to your favorite book by embedding3 it into GPT3:
1. Embed a book into GPT-3 / ChatGPT.
2. Talk directly to the book to learn.

Learn from a book through conversation, questions, and in the order you want.
— Aaron Ng (@localghost) 9:02 PM ∙ Feb 21, 2023
Some useful tips for making ChatGPT better at answering your questions:
Let's start with the prompt 💬

When you start a new conversation on ChatGPT this should be your initial prompt, click the "ALT" tag on the image to see and copy the prompt
— Linus (●ᴗ●) (@LinusEkenstam) 1:00 AM ∙ Feb 8, 2023
Turn drawings into art with these two fun tools:
This is a cool application of AI: MiniStudio.ai uses AI to turn kids' drawings into characters. Parents and kids can then tell stories featuring their own characters.

Cool example of tech unlocking new forms of creativity.
— Rex Woodbury (@rex_woodbury) 8:03 PM ∙ Feb 20, 2023

🖍️ Announcing Scribble Diffusion!

A free and open-source AI-powered web app that lets you draw a rough sketch, add a text prompt, and generate images based on your sketch.

scribblediffusion.com
— Zeke Sikelianos (@zeke) 6:40 PM ∙ Feb 16, 2023

And that’s a wrap for this week folks!

Thanks for reading The Hitchhikers Guide to AI! Subscribe for free to receive new posts and support my work.

To learn more about Google’s PaLM model and see how it did in a side-by-side test I did against ChatGPT, check out this article I recently wrote: ↩
Is Google still the leader in AI?
Hi Readers, Ever since Google shared its latest AI research update a few weeks ago, a question has been on my mind that no one has answered yet: How do Google’s latest AI models stack up against the competition? In this post, I will try to answer that question by reviewing Google’s most recent AI advancements across lang…
The Hitchhiker's Guide to AI
NLP is a machine learning technology that gives computers the ability to interpret, manipulate, and comprehend human language , and is used to reduce cumbersome manual processes, address citizen demands for transparency and responsiveness, solve workforce challenges, and unleash new insights from data. ↩
In machine learning, an embedding is a way to represent an object or feature as a vector of numbers that captures its essential characteristics. This vector representation is often lower-dimensional than the original feature space and is designed to capture semantic relationships between features.
For example, in natural language processing, word embeddings are commonly used to represent words as vectors of numbers. These embeddings are learned from large text corpora and are designed to capture semantic relationships between words. For instance, similar words, such as "dog" and "cat," are represented by vectors that are close together in the embedding space, while dissimilar words, such as "dog" and "car," are represented by vectors that are far apart.
Similarly, in image processing, feature embeddings can be used to represent images as vectors of numbers. These embeddings are learned from large collections of images and are designed to capture visual relationships between images.
Overall, embeddings are a powerful way to represent complex objects or features in a way that can be easily used by machine learning algorithms. ↩

The Best

The Rest…

Sign up for more like this.