1 code implementation • 29 Feb 2024 • Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun
In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e. g., harmlessness) can diminish performance in others (e. g., helpfulness).