no code implementations • 29 Apr 2024 • Scott Viteri, Max Lamparth, Peter Chatain, Clark Barrett
We derive a "Markovian training" procedure by applying our definition of informativeness to a Markovian LM and optimizing via policy gradient and Proximal Policy Optimization (PPO).
1 code implementation • 31 Mar 2020 • Scott Viteri, Simon DeDeo
Mathematical proofs are both paradigms of certainty and some of the most explicitly-justified arguments that we have in the cultural record.