Louis Serano

Live with Jay Alammar, Josh Starmer, and Luis Serrano

Live with Jay Alammar, Josh Starmer, and Luis Serrano

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

KL Divergence - How to tell how different two distributions are

KL Divergence - How to tell how different two distributions are

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

3y | Louis Serano

Thompson sampling, one armed bandits, and the Beta distribution

Thompson sampling, one armed bandits, and the Beta distribution

3y | Louis Serano

Eigenvectors and Generalized Eigenspaces

Eigenvectors and Generalized Eigenspaces

3y | Louis Serano

<< < 2 3 4 5 6

Unirsi al gruppo

Membri

Mmm7777

Ricerca