Search Results for author: Andres Carranza

Found 3 papers, 0 papers with code

Bridging Associative Memory and Probabilistic Modeling

no code implementations15 Feb 2024 Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang Truong, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo

Based on the observation that associative memory's energy functions can be seen as probabilistic modeling's negative log likelihoods, we build a bridge between the two that enables useful flow of ideas in both directions.

In-Context Learning

Deceptive Alignment Monitoring

no code implementations20 Jul 2023 Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves.

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

no code implementations20 Jul 2023 Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks.

Anomaly Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.