Search Results for author: Xiang Hao

Found 17 papers, 3 papers with code

Detect Profane Language in Streaming Services to Protect Young Audiences

no code implementations ACL (ECNLP) 2021 Jingxiang Chen, Kai Wei, Xiang Hao

With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content.

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

1 code implementation11 Oct 2023 Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

However, the effectiveness of these models is hindered in real-world scenarios due to the unreliable or even absence of pre-registered cues.

Language Modelling Large Language Model +1

Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

no code implementations16 May 2023 Di Xu, Yang Zhao, Xiang Hao, Xin Meng

We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations.

Management

Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement

2 code implementations18 Dec 2022 Xiang Hao, Xiaofei Li

FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset.

Computational Efficiency Speech Enhancement

AVT: Audio-Video Transformer for Multimodal Action Recognition

no code implementations Submitted to ICLR 2022 Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.

Action Recognition Audio Classification +3

Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes

no code implementations16 Jun 2022 Xiang Hao, Jingxiang Chen, Shixing Chen, Ahmed Saad, Raffay Hamid

To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e. g., nudity, sex, violence, or drug-use).

Temporal Localization

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

no code implementations30 Mar 2022 Zhenhao Jin, Xiang Hao, Xiangdong Su

This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem and proposes a coarse-to-fine recursive speech separation method.

Speech Separation Target Speaker Extraction

Deep-Learned Broadband Encoding Stochastic Filters for Computational Spectroscopic Instruments

no code implementations17 Dec 2020 Hongya Song, Yaoguang Ma, Yubing Han, Weidong Shen, Wenyi Zhang, Yanghui Li, Xu Liu, Yifan Peng, Xiang Hao

Computational spectroscopic instruments with Broadband Encoding Stochastic (BEST) filters allow the reconstruction of the spectrum at high precision with only a few filters.

Instrumentation and Detectors

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

6 code implementations29 Oct 2020 Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li

In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.

Speech Enhancement

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

no code implementations29 Oct 2020 Xiang Hao, Xiangdong Su, Zhiyu Wang, HUI ZHANG, Batushiren

This approach consists of a generator network and a discriminator network, which operate directly in the time domain.

Speech Enhancement

An Edge Information and Mask Shrinking Based Image Inpainting Approach

no code implementations11 Jun 2020 Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao

The mask shrinking strategy is employed in the image completion model to track the areas to be repaired.

Image Inpainting valid

Sub-Band Knowledge Distillation Framework for Speech Enhancement

no code implementations29 May 2020 Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li

In single-channel speech enhancement, methods based on full-band spectral features have been widely studied.

Knowledge Distillation Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.