Machine Learning: The Art of Explore vs. Exploit – SXSW Recap

Session by Oscar Celma (Pandora)
March 11, 2017

Pandora’s mission: be the effortless source of personalized music enjoyment and discovery

Some mind blowing statistics at Pandora

75M monthly average users
24 hours of listening per month
75B thumbs-up
12B stations created
98% of artists spinning every month

How does Pandora decide what to play next?

Content based algorithm: music genome data
Collective intelligence: mining user behavior
Personalized filtering: your thumbs up and skips
Ensemble recommender: piece together output from 75 different algorithms

Challenges: balance familiar with unfamiliar

Exploit: play awesome music now. Tomorrow? Who cares. Don’t play music I don’t like.
Explore: play something risky. Learning what to play. Don’t play too many WTF (“what the freakommendation” – Paul Lamere“).

Novelty versus relevance

Exploit: low novelty, high relevance
Explore: high novelty, high relevance
Popular: low novelty and low relevance
Risky: high novelty, low relevance

How does Pandora test new ideas?

  1. Dream idea
  2. Experiment in small group (1% of users)
  3. If successful, roll out 6-12 months later

Metrics: did it bring new listeners? Did it avoid churn? Did they listen for longer?

Retention: time spent listening, active days

Activity: thumbs, skips, create new stations

Pandora’s Tech Stack (some of it)

Memcache, Redis, Python, Java, Scala, Hive, Spark, PostgreSQL, Hadoop (HDFS)



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s