Louis Serano

Vytvořit příspěvek

Vytvořit příspěvek

Live with Jay Alammar, Josh Starmer, and Luis Serrano

Live with Jay Alammar, Josh Starmer, and Luis Serrano

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

KL Divergence - How to tell how different two distributions are

KL Divergence - How to tell how different two distributions are

The covariance matrix

The covariance matrix

4y | Louis Serano

The Beta distribution in 12 minutes!

The Beta distribution in 12 minutes!

4y | Louis Serano

The Gini Impurity Index explained in 8 minutes!

The Gini Impurity Index explained in 8 minutes!

4y | Louis Serano

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

4y | Louis Serano

Thompson sampling, one armed bandits, and the Beta distribution

Thompson sampling, one armed bandits, and the Beta distribution

4y | Louis Serano

Eigenvectors and Generalized Eigenspaces

Eigenvectors and Generalized Eigenspaces

4y | Louis Serano

<< < 2 3 4 5 6

Přidat se ke skupině

Členové

Mmm7777

Vyhledávání