Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Creată 10mo | 21 iun. 2024, 18:10:03

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

Discrete Dynamical Systems - Eigenvalues and Eigenvectors

Discrete Dynamical Systems - Eigenvalues and Eigenvectors

31 mar. 2025, 15:10:09 | Louis Serano

Mean, Variance, Skewness, and Kurtosis - Math for ML with Deeplearning.ai

Mean, Variance, Skewness, and Kurtosis - Math for ML with Deeplearning.ai

12 mar. 2025, 19:10:11 | Louis Serano

The three steps to make a reliable chatbot: Preamble, Fine-tuning, and RAG

The three steps to make a reliable chatbot: Preamble, Fine-tuning, and RAG

11 mar. 2025, 17:30:02 | Louis Serano

Newton's method for approximating zeros of polynomials - Math for ML with Deeplearning.ai

Newton's method for approximating zeros of polynomials - Math for ML with Deeplearning.ai

5 mar. 2025, 15:20:03 | Louis Serano

The Stone-Weierstrass Theorem - How to approximate functions

The Stone-Weierstrass Theorem - How to approximate functions

25 feb. 2025, 17:10:03 | Louis Serano

Keys, Queries, and Values: The celestial mechanics of attention

Keys, Queries, and Values: The celestial mechanics of attention

18 feb. 2025, 15:50:09 | Louis Serano

Why is ChatGPT so bad at telling jokes (yet so good at writing poems?)

Why is ChatGPT so bad at telling jokes (yet so good at writing poems?)

17 feb. 2025, 00:30:07 | Louis Serano

Tomas_r2