no code implementations • 20 Dec 2022 • Steffen Illium, Robert Müller, Andreas Sedlmeier, Claudia-Linnhoff Popien
We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings.