Search Results for author: Amir Saeidi

Found 2 papers, 1 papers with code

Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization

1 code implementation26 May 2024 Amir Saeidi, Shivanshu Verma, Aswin RRV, Chitta Baral

However, while RL-free methods deliver satisfactory performance, they require significant data to develop a robust Supervised Fine-Tuned (SFT) model and an additional step to fine-tune this model on a preference dataset, which constrains their utility and scalability.

Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks

no code implementations23 Apr 2024 Amir Saeidi, Shivanshu Verma, Chitta Baral

Key observations reveal that alignment methods achieve optimal performance with smaller training data subsets, exhibit limited effectiveness in reasoning tasks yet significantly impact mathematical problem-solving, and employing an instruction-tuned model notably influences truthfulness.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.