no code implementations • 26 Apr 2024 • Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun
In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training.
no code implementations • 15 Apr 2024 • Minji Kim, Dongyoon Han, Taekyung Kim, Bohyung Han
We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process.
2 code implementations • 28 Mar 2024 • Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance.
1 code implementation • 28 Mar 2024 • Donghyun Kim, Byeongho Heo, Dongyoon Han
This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures.
1 code implementation • 20 Mar 2024 • Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun
This study provides a comprehensive analysis of RoPE when applied to ViTs, utilizing practical implementations of RoPE for 2D vision data.
no code implementations • 30 Dec 2023 • Taekyung Kim, Dongyoon Han, Byeongho Heo
Masked Image Modeling (MIM) arises as a promising option for Vision Transformers among various self-supervised learning (SSL) methods.
1 code implementation • 15 Dec 2023 • Minhyun Lee, Song Park, Byeongho Heo, Dongyoon Han, Hyunjung Shim
A recent breakthrough by SeiT proposed the use of Vector-Quantized (VQ) feature vectors (i. e., tokens) as network inputs for vision classification.
no code implementations • 30 Nov 2023 • Jiwon Kim, Byeongho Heo, Sangdoo Yun, Seungryong Kim, Dongyoon Han
Recent approaches for semantic correspondence have focused on obtaining high-quality correspondences using a complicated network, refining the ambiguous or noisy matching points.
1 code implementation • ICCV 2023 • Jongbin Ryu, Dongyoon Han, Jongwoo Lim
We introduce a novel architecture design that enhances expressiveness by incorporating multiple head classifiers (\ie, classification heads) instead of relying on channel expansion or additional building blocks.
no code implementations • 20 Oct 2023 • Taekyung Kim, Sanghyuk Chun, Byeongho Heo, Dongyoon Han
MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder reconstructing the masked tokens to the input.
1 code implementation • 20 Jun 2023 • Byeongho Heo, Taekyung Kim, Sangdoo Yun, Dongyoon Han
In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-model (MaskSub).
1 code implementation • 15 May 2023 • JoonHyun Jeong, Joonsang Yu, Geondo Park, Dongyoon Han, Youngjoon Yoo
Recent neural architecture search (NAS) approaches rely on validation loss or accuracy to find the superior network for the target data.
3 code implementations • 30 Mar 2023 • Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh
We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB).
1 code implementation • CVPR 2023 • Beomyoung Kim, JoonHyun Jeong, Dongyoon Han, Sung Ju Hwang
In this paper, we introduce a novel learning scheme named weakly semi-supervised instance segmentation (WSSIS) with point labels for budget-efficient and high-performance instance segmentation.
1 code implementation • ICCV 2023 • Dongyoon Han, Junsuk Choe, Seonghyeok Chun, John Joon Young Chung, Minsuk Chang, Sangdoo Yun, Jean Y. Song, Seong Joon Oh
We refer to the new paradigm of training models with annotation byproducts as learning using annotation byproducts (LUAB).
no code implementations • ICCV 2023 • Dahuin Jung, Dongyoon Han, Jihwan Bang, Hwanjun Song
However, we observe that the use of a prompt pool creates a domain scalability problem between pre-training and continual learning.
no code implementations • 16 Dec 2022 • Sangyeop Yeo, Yoojin Jang, Jy-yong Sohn, Dongyoon Han, Jaejun Yoo
To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably.
no code implementations • 20 Oct 2022 • Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee
In recent years, many deep neural architectures have been developed for image classification.
1 code implementation • 19 Jul 2022 • Sukmin Yun, Jaehyung Kim, Dongyoon Han, Hwanjun Song, Jung-Woo Ha, Jinwoo Shin
Understanding temporal dynamics of video is an essential aspect of learning better video representations.
no code implementations • 25 Apr 2022 • Kyeongtak Han, Youngeun Kim, Dongyoon Han, Sungeun Hong
To solve these, we fully utilize pseudo labels of the unlabeled target domain by leveraging loss prediction.
1 code implementation • 17 Apr 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers have been widely used in numerous vision problems especially for visual recognition and detection.
no code implementations • 8 Apr 2022 • Jinhyung Kim, Taeoh Kim, Minho Shim, Dongyoon Han, Dongyoon Wee, Junmo Kim
FreqAug stochastically removes specific frequency components from the video so that learned representation captures essential features more from the remaining information for various downstream tasks.
1 code implementation • CVPR 2022 • Jisoo Mok, Byunggook Na, Ji-Hoon Kim, Dongyoon Han, Sungroh Yoon
To take such non-linear characteristics into account, we introduce Label-Gradient Alignment (LGA), a novel NTK-based metric whose inherent formulation allows it to capture the large amount of non-linear advantage present in modern neural architectures.
1 code implementation • ICLR 2022 • Dongyoon Han, Youngjoon Yoo, Beomyoung Kim, Byeongho Heo
We aim to break the stereotype of organizing the spatial operations of building blocks into trainable layers.
4 code implementations • 30 Nov 2021 • Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs.
Ranked #10 on Document Image Classification on RVL-CDIP
1 code implementation • 26 Nov 2021 • Jaemin Na, Dongyoon Han, Hyung Jin Chang, Wonjun Hwang
In the contrastive space, inter-domain discrepancy is mitigated by constraining instances to have contrastive views and labels, and the consensus space reduces the confusion between intra-domain categories.
Ranked #1 on Unsupervised Domain Adaptation on PACS
1 code implementation • ICLR 2022 • Hwanjun Song, Deqing Sun, Sanghyuk Chun, Varun Jampani, Dongyoon Han, Byeongho Heo, Wonjae Kim, Ming-Hsuan Yang
Transformers are transforming the landscape of computer vision, especially for recognition tasks.
Ranked #12 on Object Detection on COCO 2017 val
10 code implementations • ICCV 2021 • Byeongho Heo, Sangdoo Yun, Dongyoon Han, Sanghyuk Chun, Junsuk Choe, Seong Joon Oh
We empirically show that such a spatial dimension reduction is beneficial to a transformer architecture as well, and propose a novel Pooling-based Vision Transformer (PiT) upon the original ViT model.
Ranked #336 on Image Classification on ImageNet
2 code implementations • CVPR 2021 • Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Junsuk Choe, Sanghyuk Chun
However, they have not fixed the training set, presumably because of a formidable annotation cost.
Ranked #20 on Image Classification on OmniBenchmark
2 code implementations • 7 Dec 2020 • Sangdoo Yun, Seong Joon Oh, Byeongho Heo, Dongyoon Han, Jinhyung Kim
Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers.
10 code implementations • CVPR 2021 • Dongyoon Han, Sangdoo Yun, Byeongho Heo, Youngjoon Yoo
We then investigate the channel configuration of a model by searching network architectures concerning the channel configuration under the computational cost restriction.
Ranked #293 on Image Classification on ImageNet (using extra training data)
4 code implementations • ICLR 2021 • Byeongho Heo, Sanghyuk Chun, Seong Joon Oh, Dongyoon Han, Sangdoo Yun, Gyuwan Kim, Youngjung Uh, Jung-Woo Ha
Because of the scale invariance, this modification only alters the effective step sizes without changing the effective update directions, thus enjoying the original convergence properties of GD optimizers.
no code implementations • 9 Mar 2020 • Sanghyuk Chun, Seong Joon Oh, Sangdoo Yun, Dongyoon Han, Junsuk Choe, Youngjoon Yoo
Despite apparent human-level performances of deep neural networks (DNN), they behave fundamentally differently from humans.
2 code implementations • 15 Jun 2019 • YoungJoon Yoo, Dongyoon Han, Sangdoo Yun
In this paper, we propose a new multi-scale face detector having an extremely tiny number of parameters (EXTD), less than 0. 1 million, as well as achieving comparable performance to deep heavy detectors.
Ranked #23 on Face Detection on WIDER Face (Hard)
30 code implementations • ICCV 2019 • Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo
Regional dropout strategies have been proposed to enhance the performance of convolutional neural network classifiers.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
no code implementations • ICLR 2019 • Jisung Hwang, Younghoon Kim, Sanghyuk Chun, Jaejun Yoo, Ji-Hoon Kim, Dongyoon Han, Jung-Woo Ha
The checkerboard phenomenon is one of the well-known visual artifacts in the computer vision field.
13 code implementations • ICCV 2019 • Jeonghun Baek, Geewook Kim, Junyeop Lee, Sungrae Park, Dongyoon Han, Sangdoo Yun, Seong Joon Oh, Hwalsuk Lee
Many new proposals for scene text recognition (STR) models have been introduced in recent years.
Ranked #7 on Scene Text Recognition on ICDAR 2003
18 code implementations • CVPR 2019 • Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee
Scene text detection methods based on neural networks have emerged recently and have shown promising results.
Ranked #1 on Scene Text Detection on ICDAR 2013 (Precision metric)
2 code implementations • 12 Dec 2018 • Hyojin Park, Youngjoon Yoo, Geonseok Seo, Dongyoon Han, Sangdoo Yun, Nojun Kwak
To resolve this problem, we propose a new block called Concentrated-Comprehensive Convolution (C3) which applies the asymmetric convolutions before the depth-wise separable dilated convolution to compensate for the information loss due to dilated convolution.
9 code implementations • CVPR 2017 • Dongyoon Han, Jiwhan Kim, Junmo Kim
This design, which is discussed in depth together with our new insights, has proven to be an effective means of improving generalization ability.
Ranked #98 on Image Classification on CIFAR-10
no code implementations • CVPR 2015 • Dongyoon Han, Junmo Kim
Unlike the recent unsupervised feature selection methods, SOCFS does not explicitly use the pre-computed local structure information for data points represented as additional terms of their objective functions, but directly computes latent cluster information by the target matrix conducting orthogonal basis clustering in a single unified term of the proposed objective function.
no code implementations • CVPR 2014 • Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim
By mapping a low dimensional RGB color to a feature vector in a high-dimensional color space, we show that we can linearly separate the salient regions from the background by finding an optimal linear combination of color coefficients in the high-dimensional color space.