OpenAI's Reinforcement Finetuning: RL + Science — A New God or Thanos?

Discover OpenAI's Reinforcement Finetuning (RFT), combining RLHF and expert data for breakthroughs in medical diagnosis, decision-making, and scientific challenges.

Dec 08, 2024

∙ Paid

12 Days of OpenAI | Day2 Reinforcement fine tuning.

On December 6, 2024, at 11 a.m. California time, OpenAI released a new Reinforcement Finetuning (RFT) method for building expert models. This approach allows users to solve decision-making problems in specialized domains, such as medical diagnoses or rare disease detection, by fine-tuning as few as a few dozen to a few thousand training cases.

OpenAI Series #2: Enhanced Fine-Tuning – Train Your Expert Model with Minimal Samples

Meng Li

December 7, 2024

Read full story

The training data is formatted similarly to common instruction tuning datasets, consisting of multiple options and a correct answer. At the same time, OpenAI launched a Reinforcement Finetuning research project, encouraging scholars and experts to upload unique datasets from their fields to test this fine-tuning method.

This method builds upon techniques already widely used in alignment, mathematics, and coding. Its foundation lies in Reinforcement Learning from Human Feedback (RLHF), which aligns large models with human-preference data. In RLHF, training data consists of questions (answer 1, answer 2, preference), where users select the preferred response. This data is used to train a reward model. Once the reward model is established, reinforcement learning algorithms (e.g., PPO or DPO) fine-tune the model parameters, enabling it to produce content that is more aligned with user preferences.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.

AI Disruption

OpenAI's Reinforcement Finetuning: RL + Science — A New God or Thanos?

Discover OpenAI's Reinforcement Finetuning (RFT), combining RLHF and expert data for breakthroughs in medical diagnosis, decision-making, and scientific challenges.

OpenAI Series #2: Enhanced Fine-Tuning – Train Your Expert Model with Minimal Samples

Continue reading this post for free, courtesy of Meng Li.