Learning Dynamics of LLM Finetuning

When
Where
Event language(s)
Title: Learning Dynamics of LLM Finetuning
Presenter: Linli Zhang
Abstract: Learning dynamics, which describes how the learning of specific training examples influences the model鈥檚 predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. The authors study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Their framework allows a
uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, they propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. They also extend their framework and highlight a unique 鈥渟queezing effect鈥 to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM鈥檚 finetuning but also inspires a simple, effective method to improve alignment performance.
Paper link:
Disclaimer: The presenter is not part of the authors!
LLM seminar
Seminar on Large Language Models in the CS Department