A synthetic sound mixture specification dataset for the Target Sound Extraction (TSE) task. Dataset samples consist of a .jams file specifying the mixture components, and a metadata file with target labels. Mixtures are 6 seconds long and contain 3-5 unique foreground sounds over a 6 second long background sound. Each sample is provided with 3 target labels, and sounds corresponding to all target labels are guaranteed to be present in the mixture. FSDKaggle2018 is used as the source for foreground sounds and TAU Urban Acoustic Scenes 2019 is used as the source for background sounds.
Train: 50K
Val: 5K
Test: 10K
Paper | Code | Results | Date | Stars |
---|