video thumbnail 33:04
These Numbers Can Turn AI Dangerous [Subliminal Learning]

2025-09-04

[public] 20.2K views, 1.68K likes, dislikes audio only

channel thumbWelch Labs

Checkout RunPod’s AI infrastructure platform: https://get.runpod.io/welchlabs

Discount code at checkout: WELCH10

Note that need to buy $15 or more in runpod credits for the discount code to apply, $10 will be deducted from your total. See screen recording at 3:31.

Subliminal Learning Poster at 31:09: https://www.welchlabs.com/resources/subliminal-learning-poster-17x22

Subliminal Learning Bundle: https://www.welchlabs.com/resources/subliminal-learning-poster-book-bundle

Subliminal Learning Poster - Digital Download: https://www.welchlabs.com/resources/subliminal-learning-poster-digital-download

Sections

0:00 - Intro

1:47 - Why Welch Labs uses runpod for AI infrastructure - sponsored ad

3:49 - The subliminal learning phenomenon

5:44 - In context learning

6:56 - Why can’t we just train a classifier?

7:45 - Other clues

9:28 - Small scale replication on MNIST

12:47 - Mathematical proof

23:01 - Proof Take-aways

25:38 - Solving the GPT 4.1/4o mystery

26:14 - My take on what’s going on

27:55 - The token entanglement hypothesis

29:11 - Final thoughts & take-aways

31:09 - Subliminal Learning Poster!

References

Subliminal Learning Paper and code: https://alignment.anthropic.com/2025/subliminal-learning/

Generate Your Own Numbers: https://subliminaldata.streamlit.app/

Token Entanglement: https://www.lesswrong.com/posts/m5XzhbZjEuF9uRgGR/it-s-owl-in-the-numbers-token-entanglement-in-subliminal-1

Hinton et. al. 2015. Distilling the Knowledge in a Neural Network. https://arxiv.org/pdf/1503.02531

Full Video on Backpropagation: /youtube/video/VkHfRKewkWw

Softmax Basics: /youtube/video/VkHfRKewkWw

Softmax Gradient: /youtube/video/VkHfRKewkWw

Softmax Visualized: /youtube/video/VkHfRKewkWw

Big thanks to Alex Cloud, Minh Le, Jacob Hilton, and Owain Evans for graciously answering my questions as I worked on the script.

Special Thanks to Patrons https://www.patreon.com/welchlabs

Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich, Mitch Jacobs, Lauren Steely, Jeff Eastman, Rodolfo Ibarra, Clark Barrus, Rob Napier, Andrew White, Richard B Johnston, abhiteja mandava, Burt Humburg, Kevin Mitchell, Daniel Sanchez, Ferdie Wang, Tripp Hill, Richard Harbaugh Jr, Prasad Raje, Kalle Aaltonen, Midori Switch Hound, Zach Wilson, Chris Seltzer, Ven Popov, Hunter Nelson, Amit Bueno, Scott Olsen, Johan Rimez, Shehryar Saroya, Tyler Christensen, Beckett Madden-Woods, Darrell Thomas, Javier Soto, U007D, Caleb Begly, Rick Rubenstein, Brent Hunsaker, Dan Patterson, Tchsurvives, Alex Adai, Walter Reade, Zyansheep, Walter Reade, Duncan Stannett, Reginald Carey, Jean-Manuel Izaret, dh71633, Adrian Rodriguez, Dimitar Stojanovski, Michael Harder, Peter Maldonado, Emily Pesce, David Johnston, Insang Song, FaeTheWolf, Stephen Taylor, KittenKaboodle, EMatter, PATRICKMCCORMACK, John Beahan, Cameron, Cole Jones, Garrett Thornburg, Jeroen W, Rohit Sharma, GlennB, Emmanuel Cortes, Katie Quinn, Karina C, Cakra WW, Mike Ton, Eric Gometz, MacCallister Higgins, Niko Drossos, David Eraso, Tom Zehle, Steve, Brian Lineburg, rjbl, Michael Loh, Perry Vais, Bengal0, Farhad Manjoo, Sara Chipps

Special thank you to these readers for helping improve the Imaginary Numbers Book!

Marwan Daar, Matt Ellis, Nico Weber, Rafa Barroso, Jacob Sorensen, Bob Hall, Evan Van Peursem, Phillipe Loher, Attila Medl, Abdul Wahid Tanner, A friendly critic, NuttySwiss, Dean Burdick, Paul Du Bois, Włodzimierz Bzyl

Code for Welch Labs Videos: https://github.com/stephencwelch/manim_videos

Written by: Stephen Welch

Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu

Premium Beat IDs

EEDYZ3FP44YX8OWT

MWROXNAY0SPXCMBS


Intro
/youtube/video/NUAb6zHXqdI?t=0
Why Welch Labs uses runpod for AI infrastructure - sponsored ad
/youtube/video/NUAb6zHXqdI?t=107
The subliminal learning phenomenon
/youtube/video/NUAb6zHXqdI?t=229
In context learning
/youtube/video/NUAb6zHXqdI?t=344
Why can’t we just train a classifier?
/youtube/video/NUAb6zHXqdI?t=416
Other clues
/youtube/video/NUAb6zHXqdI?t=465
Small scale replication on MNIST
/youtube/video/NUAb6zHXqdI?t=568
Mathematical proof
/youtube/video/NUAb6zHXqdI?t=767
Proof Take-aways
/youtube/video/NUAb6zHXqdI?t=1381
Solving the GPT 4.1/4o mystery
/youtube/video/NUAb6zHXqdI?t=1538
My take on what’s going on
/youtube/video/NUAb6zHXqdI?t=1574
The token entanglement hypothesis
/youtube/video/NUAb6zHXqdI?t=1675
Final thoughts & take-aways
/youtube/video/NUAb6zHXqdI?t=1751
Subliminal Learning Poster!
/youtube/video/NUAb6zHXqdI?t=1869