Silkenweb Example: Hackernews Clone

OpenAI: support for Reinforcement Fine-tuning available to verified orgs

1 point by justanotheratom 1 month ago | 1 comment

justanotheratom 1 month ago
my question for anyone who knows:
Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.