Search Results for author: Xiang Hao

Found 17 papers, 3 papers with code

Detect Profane Language in Streaming Services to Protect Young Audiences

no code implementations • ACL (ECNLP) 2021 • Jingxiang Chen, Kai Wei, Xiang Hao

With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content.

Paper
Add Code

AI-Generated Content Enhanced Computer-Aided Diagnosis Model for Thyroid Nodules: A ChatGPT-Style Assistant

no code implementations • 4 Feb 2024 • Jincao Yao, Yunpeng Wang, Zhikai Lei, Kai Wang, Xiaoxian Li, Jianhua Zhou, Xiang Hao, Jiafei Shen, Zhenping Wang, Rongrong Ru, Yaqing Chen, Yahan Zhou, Chen Chen, YanMing Zhang, Ping Liang, Dong Xu

After training, ThyGPT could automatically evaluate thyroid nodule and engage in effective communication with physicians through human-computer interaction.

Specificity

Paper
Add Code

Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

1 code implementation • 11 Oct 2023 • Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

However, the effectiveness of these models is hindered in real-world scenarios due to the unreliable or even absence of pre-registered cues.

Language Modelling Large Language Model +1

Paper
Code

Pink-Eggs Dataset V1: A Step Toward Invasive Species Management Using Deep Learning Embedded Solutions

no code implementations • 16 May 2023 • Di Xu, Yang Zhao, Xiang Hao, Xin Meng

We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations.

Management

Paper
Add Code

Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge

no code implementations • 14 Mar 2023 • Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie

Achieving 0. 446 in the final score and 0. 517 in the P. 835 score, our system ranks 4th in the non-real-time track.

Vocal Bursts Valence Prediction

Paper
Add Code

Fast FullSubNet: Accelerate Full-band and Sub-band Fusion Model for Single-channel Speech Enhancement

2 code implementations • 18 Dec 2022 • Xiang Hao, Xiaofei Li

FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset.

Computational Efficiency Speech Enhancement

505

Paper
Code

AVT: Audio-Video Transformer for Multimodal Action Recognition

no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.

Ranked #4 on Multi-modal Classification on VGG-Sound

Action Recognition Audio Classification +3

Paper
Add Code

Multiscale Multimodal Transformer for Multimodal Action Recognition

no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar

In this work, we develop a multiscale multimodal Transformer (MMT) that employs hierarchical representation learning.

Ranked #1 on Multi-modal Classification on VGG-Sound

Action Recognition Audio Classification +2

Paper
Add Code

Scalable Temporal Localization of Sensitive Activities in Movies and TV Episodes

no code implementations • 16 Jun 2022 • Xiang Hao, Jingxiang Chen, Shixing Chen, Ahmed Saad, Raffay Hamid

To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e. g., nudity, sex, violence, or drug-use).

Temporal Localization

Paper
Add Code

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

no code implementations • 30 Mar 2022 • Zhenhao Jin, Xiang Hao, Xiangdong Su

This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem and proposes a coarse-to-fine recursive speech separation method.

Speech Separation Target Speaker Extraction

Paper
Add Code

Movies2Scenes: Using Movie Metadata to Learn Scene Representation

no code implementations • CVPR 2023 • Shixing Chen, Chun-Hao Liu, Xiang Hao, Xiaohan Nie, Maxim Arap, Raffay Hamid

However, labeling individual scenes is a time-consuming process.

Contrastive Learning Scene Understanding

Paper
Add Code

Deep-Learned Broadband Encoding Stochastic Filters for Computational Spectroscopic Instruments

no code implementations • 17 Dec 2020 • Hongya Song, Yaoguang Ma, Yubing Han, Weidong Shen, Wenyi Zhang, Yanghui Li, Xu Liu, Yifan Peng, Xiang Hao

Computational spectroscopic instruments with Broadband Encoding Stochastic (BEST) filters allow the reconstruction of the spectrum at high precision with only a few filters.

Instrumentation and Detectors

Paper
Add Code

FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement

6 code implementations • 29 Oct 2020 • Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li

In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.

Ranked #9 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Speech Enhancement

505

Paper
Code

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

no code implementations • 29 Oct 2020 • Xiang Hao, Xiangdong Su, Zhiyu Wang, HUI ZHANG, Batushiren

This approach consists of a generator network and a discriminator network, which operate directly in the time domain.

Speech Enhancement

Paper
Add Code

An Edge Information and Mask Shrinking Based Image Inpainting Approach

no code implementations • 11 Jun 2020 • Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao

The mask shrinking strategy is employed in the image completion model to track the areas to be repaired.

Image Inpainting valid

Paper
Add Code

SNR-Based Teachers-Student Technique for Speech Enhancement

no code implementations • 29 May 2020 • Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao

Specifically, this method consists of multiple teacher models and a student model.

Speech Enhancement

Paper
Add Code

Sub-Band Knowledge Distillation Framework for Speech Enhancement

no code implementations • 29 May 2020 • Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li

In single-channel speech enhancement, methods based on full-band spectral features have been widely studied.

Knowledge Distillation Speech Enhancement

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.