Live with Jay Alammar, Josh Starmer, and Luis Serrano Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning KL Divergence - How to tell how different two distributions are Keys, Queries, and Values: The celestial mechanics of attention 3d | Louis Serano Why is ChatGPT so bad at telling jokes (yet so good at writing poems?) 5d | Louis Serano Why is DeepSeek so good? 13d | Louis Serano Universal Approximation Theorem - The Fundamental Building Block of Deep Learning 29d | Louis Serano Happy 2025, and thank you for your support! 2mo | Louis Serano The Kolmogorov-Arnold Theorem 2mo | Louis Serano Kolmogorov-Arnold Networks (KANs) - What are they and how do they work? 3mo | Louis Serano Live with Jay Alammar, Josh Starmer, and Luis Serrano 3mo | Louis Serano What is AdaBoost? Friendly explanation with code! 3mo | Louis Serano What is Positional Encoding in Transformer Models? 4mo | Louis Serano 1 2 3 4 5 > >> Pridať sa k skupine Členovia Vyhľadávanie VytvorenéPosledný deňPosledný štyri dniMinulý mesiac Choose a GroupLouis Serano Choose a User Triediť podľapodľa relevantnostiUpvotedNové ako prvéPočet záložiekPočet komentárov Vyhľadávanie