Reinforcement Learning for Large Language Model Alignment and Inference
Pascal Poupart (Professor at University of Waterloo and Canada CIFAR AI Chair at the Vector Institute)Reinforcement learning (RL) has become a key tool to train large language models (LLMs). In this lecture, I will explain how RL from human feedback can improve the alignment of LLMs. I will also discuss recent advances in reward guided text generation. Finally, I will explain how to leverage reward process models to improve inference time reasoning.