supervised-learning

May
01
ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

🆕 from Yannic Kilcher! Discover how ORPO simplifies preference optimization by integrating supervised fine-tuning and alignment steps into a unified procedure,
3 min read