Skip to content

Reinforcement Learning from Human Feedback

Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.

vLLM can be used to generate the completions for RLHF. Some ways to do this include using libraries like TRL, OpenRLHF, verl and unsloth.

See the following basic examples to get started if you don't want to use an existing library:

See the following notebooks showing how to use vLLM for GRPO: