1 code implementation • 26 May 2024 • Amir Saeidi, Shivanshu Verma, Aswin RRV, Chitta Baral
However, while RL-free methods deliver satisfactory performance, they require significant data to develop a robust Supervised Fine-Tuned (SFT) model and an additional step to fine-tune this model on a preference dataset, which constrains their utility and scalability.
no code implementations • 23 Apr 2024 • Amir Saeidi, Shivanshu Verma, Chitta Baral
Key observations reveal that alignment methods achieve optimal performance with smaller training data subsets, exhibit limited effectiveness in reasoning tasks yet significantly impact mathematical problem-solving, and employing an instruction-tuned model notably influences truthfulness.