Direct Preference Optimization (DPO)
Phi-4 Beats Models Five Times Its Size: Microsoft’s Phi-4 blends synthetic and organic data to surpass larger models in math and reasoning benchmarks
Microsoft updated its smallest model family with a single, surprisingly high-performance model.