Search Results for author: Bobak Shahriari

Found 12 papers, 4 papers with code

Gemma: Open Models Based on Gemini Research and Technology

no code implementations • 13 Mar 2024 • Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Clément Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross Mcilroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu-Hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Clément Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, Kathleen Kenealy

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models.

Paper
Add Code

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

no code implementations • 5 May 2023 • Patrick Emedom-Nnamdi, Abram L. Friesen, Bobak Shahriari, Nando de Freitas, Matt W. Hoffman

However, due to safety, ethical, and practicality constraints, this type of trial-and-error experimentation is often infeasible in many real-world domains such as healthcare and robotics.

Decision Making reinforcement-learning +1

Paper
Add Code

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

1 code implementation • 21 Apr 2022 • Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, SiQi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks.

Continuous Control reinforcement-learning +1

Paper
Code

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

no code implementations • 15 Jun 2021 • Abbas Abdolmaleki, Sandy H. Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva TB, Arunkumar Byravan, Konstantinos Bousmalis, Andras Gyorgy, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.

Offline RL reinforcement-learning +1

Paper
Add Code

Critic Regularized Regression

5 code implementations • NeurIPS 2020 • Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL regression +1

31,275

Paper
Code

Acme: A Research Framework for Distributed Reinforcement Learning

5 code implementations • 1 Jun 2020 • Matthew W. Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Nikola Momchev, Danila Sinopalnikov, Piotr Stańczyk, Sabela Ramos, Anton Raichuk, Damien Vincent, Léonard Hussenot, Robert Dadashi, Gabriel Dulac-Arnold, Manu Orsini, Alexis Jacq, Johan Ferret, Nino Vieillard, Seyed Kamyar Seyed Ghasemipour, Sertan Girgin, Olivier Pietquin, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Abe Friesen, Ruba Haroun, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research.

DQN Replay Dataset reinforcement-learning +1

3,385

Paper
Code

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation • ICLR 2020 • Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

2,597

Paper
Code

Which Learning Algorithms Can Generalize Identity-Based Rules to Novel Inputs?

no code implementations • 12 May 2016 • Paul Tupper, Bobak Shahriari

We propose a novel framework for the analysis of learning algorithms that allows us to say when such algorithms can and cannot generalize certain patterns from training data to test data.

Paper
Add Code

Unbounded Bayesian Optimization via Regularization

no code implementations • 14 Aug 2015 • Bobak Shahriari, Alexandre Bouchard-Côté, Nando de Freitas

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning.

Bayesian Optimization Benchmarking

Paper
Add Code

Heteroscedastic Treed Bayesian Optimisation

no code implementations • 27 Oct 2014 • John-Alexander M. Assael, Ziyu Wang, Bobak Shahriari, Nando de Freitas

At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions.

Bayesian Optimisation BIG-bench Machine Learning

Paper
Add Code

An Entropy Search Portfolio for Bayesian Optimization

no code implementations • 18 Jun 2014 • Bobak Shahriari, Ziyu Wang, Matthew W. Hoffman, Alexandre Bouchard-Côté, Nando de Freitas

How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i. e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance.

Bayesian Optimization

Paper
Add Code

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

no code implementations • 27 Mar 2013 • Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature.

Bayesian Optimization Thompson Sampling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.