Reinforcement Learning for Large Language Model Alignment and Inference

Reinforcement Learning for Large Language Model Alignment and Inference

Pascal Poupart (Professor at University of Waterloo and Canada CIFAR AI Chair at the Vector Institute)

Reinforcement learning (RL) has become a key tool to train large language models (LLMs). In this lecture, I will explain how RL from human feedback can improve the alignment of LLMs. I will also discuss recent advances in reward guided text generation. Finally, I will explain how to leverage reward process models to improve inference time reasoning.

Report Page