no code implementations • 28 Mar 2024 • Peng Tang, Tobias Lasser
Firstly, unlike current methods that usually employ two individual models for for clinical and dermoscopy modalities, we verified that multimodal feature can be learned by sharing the parameters of encoder while leaving the individual modal-specific classifiers.
no code implementations • 25 Mar 2024 • Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar
In particular, our approach improves the accuracy of the previous state-of-the-art approach from 38% to 54% on the human-written questions in the ChartQA dataset, which needs strong reasoning.
no code implementations • 19 Mar 2024 • Yubin Zheng, Peng Tang, Tianjie Ju, Weidong Qiu, Bo Yan
The intra-client and inter-client consistency learning are introduced to smooth predictions at the data level and avoid confirmation bias of local models.
1 code implementation • 7 Mar 2024 • Chi Zhang, Qilong Han, Rui Chen, Xiangyu Zhao, Peng Tang, Hongtao Song
In the second stage, we devise a self-augmentation module to augment sequences to alleviate OUPs.
1 code implementation • 12 Dec 2023 • Ke Hu, Weidong Qiu, Peng Tang
Our comprehensive analysis reveals that FNR-FL not only accelerates convergence but also significantly surpasses other contemporary federated learning algorithms in test accuracy, particularly under feature distribution skew scenarios.
no code implementations • 7 Dec 2023 • Peng Tang, Xintong Yan, Yang Nan, Xiaobin Hu, Bjoern H Menzee. Sebastian Krammer, Tobias Lasser
Most convolutional neural network (CNN) based methods for skin cancer classification obtain their results using only dermatological images.
no code implementations • 15 Nov 2023 • Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan
We present Multiple-Question Multiple-Answer (MQMA), a novel approach to do text-VQA in encoder-decoder transformer models.
no code implementations • 15 Nov 2023 • Peng Tang, Pengkai Zhu, Tian Li, Srikar Appalaraju, Vijay Mahadevan, R. Manmatha
Based on the multi-exit model, we perform step-level dynamic early exit during inference, where the model may decide to use fewer decoder layers based on its confidence of the current layer at each individual decoding step.
no code implementations • 30 Jul 2023 • Peng Tang, Zhiqiang Xu, Pengfei Wei, Xiaobin Hu, Peilin Zhao, Xin Cao, Chunlai Zhou, Tobias Lasser
To further alleviate the contingent effect of recursive stacking, i. e., ringing artifacts, we add identity shortcuts between atrous convolutions to simulate residual deconvolutions.
no code implementations • 4 Jul 2023 • Peng Tang, Yang Nan, Tobias Lasser
However, most methods only focus on designing a better module for multi-modal data fusion; few methods explore utilizing the label correlation between SPC and skin disease for performance improvement.
1 code implementation • 2 Jun 2023 • Srikar Appalaraju, Peng Tang, Qi Dong, Nishant Sankaran, Yichu Zhou, R. Manmatha
We propose DocFormerv2, a multi-modal transformer for Visual Document Understanding (VDU).
Ranked #9 on Visual Question Answering (VQA) on DocVQA test (using extra training data)
no code implementations • 17 Apr 2023 • Yunruo Zhang, Tianyu Du, Shouling Ji, Peng Tang, Shanqing Guo
In this paper, we propose the first certified defense against multi-frame attacks for RNNs called RNN-Guard.
no code implementations • 5 Sep 2022 • Yang Nan, Javier Del Ser, Zeyu Tang, Peng Tang, Xiaodan Xing, Yingying Fang, Francisco Herrera, Witold Pedrycz, Simon Walsh, Guang Yang
especially for cohorts with different lung diseases.
no code implementations • 4 Aug 2022 • Yang Nan, Peng Tang, Guyue Zhang, Caihong Zeng, Zhihong Liu, Zhifan Gao, Heye Zhang, Guang Yang
However, most machine and deep learning based approaches are supervised and developed using a large number of training samples, in which the pixelwise annotations are expensive and sometimes can be impossible to obtain.
no code implementations • 11 Mar 2022 • Yang Nan, Fengyi Li, Peng Tang, Guyue Zhang, Caihong Zeng, Guotong Xie, Zhihong Liu, Guang Yang
Recognition of glomeruli lesions is the key for diagnosis and treatment planning in kidney pathology; however, the coexisting glomerular structures such as mesangial regions exacerbate the difficulties of this task.
no code implementations • 31 May 2021 • Yan Wang, Peng Tang, Yuyin Zhou, Wei Shen, Elliot K. Fishman, Alan L. Yuille
We instantiate both the global and the local classifiers by multiple instance learning (MIL), where the attention guidance, indicating roughly where the PDAC regions are, is the key to bridging them: For global MIL based normal/PDAC classification, attention serves as a weight for each instance (voxel) during MIL pooling, which eliminates the distraction from the background; For local MIL based semi-supervised PDAC segmentation, the attention guidance is inductive, which not only provides bag-level pseudo-labels to training data without per-voxel annotations for MIL training, but also acts as a proxy of an instance-level classifier.
1 code implementation • ICLR 2021 • Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie
To prevent models from exclusively attending on a single cue in representation learning, we augment training data with images with conflicting shape and texture information (eg, an image of chimpanzee shape but with lemon texture) and, most importantly, provide the corresponding supervisions from shape and texture simultaneously.
Ranked #601 on Image Classification on ImageNet
no code implementations • 25 Jan 2020 • Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong
In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.
no code implementations • 15 Jan 2020 • Peng Tang, Chetan Ramaiah, Yan Wang, ran Xu, Caiming Xiong
two-stage object detectors) by training on both labeled and unlabeled data.
1 code implementation • 11 May 2019 • Hongru Zhu, Peng Tang, Jeongho Park, Soojin Park, Alan Yuille
We test both humans and the above-mentioned computational models in a challenging task of object recognition under extreme occlusion, where target objects are heavily occluded by irrelevant real objects in real backgrounds.
no code implementations • ECCV 2018 • Peng Tang, Xinggang Wang, Angtian Wang, Yongluan Yan, Wenyu Liu, Junzhou Huang, Alan Yuille
The Convolutional Neural Network (CNN) based region proposal generation method (i. e. region proposal network), trained using bounding box annotations, is an essential component in modern fully supervised object detectors.
4 code implementations • 9 Jul 2018 • Peng Tang, Xinggang Wang, Song Bai, Wei Shen, Xiang Bai, Wenyu Liu, Alan Yuille
The iterative instance classifier refinement is implemented online using multiple streams in convolutional neural networks, where the first is an MIL network and the others are for instance classifier refinement supervised by the preceding one.
Ranked #1 on Weakly Supervised Object Detection on ImageNet
no code implementations • 7 Apr 2018 • Yan Wang, Yuyin Zhou, Peng Tang, Wei Shen, Elliot K. Fishman, Alan L. Yuille
Based on the fact that very hard samples might have annotation errors, we propose a new sample selection policy, named Relaxed Upper Confident Bound (RUCB).
no code implementations • 7 Apr 2018 • Yuyin Zhou, Yan Wang, Peng Tang, Song Bai, Wei Shen, Elliot K. Fishman, Alan L. Yuille
In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain.
no code implementations • 30 Jan 2018 • Peng Tang, Chunyu Wang, Xinggang Wang, Wenyu Liu, Wen-Jun Zeng, Jingdong Wang
In particular, our method improves results by 8. 8% over the static image detector for fast moving objects.
no code implementations • 19 Sep 2017 • Gangming Zhao, Zhao-Xiang Zhang, He Guan, Peng Tang, Jingdong Wang
Most of convolutional neural networks share the same characteristic: each convolutional layer is followed by a nonlinear activation layer where Rectified Linear Unit (ReLU) is the most widely used.
1 code implementation • 6 May 2017 • Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu
Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.
4 code implementations • CVPR 2017 • Peng Tang, Xinggang Wang, Xiang Bai, Wenyu Liu
We propose a novel online instance classifier refinement algorithm to integrate MIL and the instance classifier refinement procedure into a single deep network, and train the network end-to-end with only image-level supervision, i. e., without object location information.
Ranked #4 on Weakly Supervised Object Detection on ImageNet
no code implementations • 8 Oct 2016 • Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, Wenyu Liu
We propose a new multiple instance neural network to learn bag representations, which is different from the existing multiple instance neural networks that focus on estimating instance label.
no code implementations • 31 Jul 2016 • Peng Tang, Xinggang Wang, Baoguang Shi, Xiang Bai, Wenyu Liu, Zhuowen Tu
Our proposed FisherNet combines convolutional neural network training and Fisher Vector encoding in a single end-to-end structure.