OpenAI: support for Reinforcement Fine-tuning available to verified orgs

1 point by justanotheratom 1 month ago | 1 comment
  • justanotheratom 1 month ago
    my question for anyone who knows:

    Between SFT, DPO, and RFT, - when to use which? - can we mix and match? e.g, first SFT, then DPO.