1 code implementation • 18 Jan 2024 • Kazuhiro Takemoto
Large Language Models (LLMs), such as ChatGPT, encounter `jailbreak' challenges, wherein safeguards are circumvented to generate ethically harmful prompts.
1 code implementation • 12 Sep 2023 • Kazuhiro Takemoto
As large language models (LLMs) become more deeply integrated into various sectors, understanding how they make moral judgments has become crucial, particularly in the realm of autonomous driving.
no code implementations • 11 Aug 2021 • Kazuki Koga, Kazuhiro Takemoto
In particular, we propose a method for generating UAPs using a simple hill-climbing search based only on DNN outputs and demonstrate the validity of the proposed method using representative DNN-based medical image classifications.
1 code implementation • 22 May 2020 • Hokuto Hirano, Kazuki Koga, Kazuhiro Takemoto
As an example, we show that iterative fine-tuning of the DNN models using UAPs improves the robustness of the DNN models against UAPs.
1 code implementation • 15 Nov 2019 • Hokuto Hirano, Kazuhiro Takemoto
Our method combines the simple iterative method for generating non-targeted UAPs and the fast gradient sign method for generating a targeted adversarial perturbation for an input.