Cross-Modal Alternating Learning with Task-Aware Representations for Continual Learning
Continual learning is a research field of artificial neural networks to simulate human lifelong learning ability. Although a surge of investigations has achieved considerable performance, most rely only on image modality for incremental image recognition tasks. In this paper, we propose a novel yet effective framework coined cross-modal Alternating Learning with Task-Aware representations (ALTA) to make good use of visual and linguistic modal information and achieve more effective continual learning. To do so, ALTA presents a cross-modal joint learning mechanism that leverages simultaneous learning of image and text representations to provide more effective supervision. And it mitigates forgetting by endowing task-aware representations with continual learning capability. Concurrently, considering the dilemma of stability and plasticity, ALTA proposes a cross-modal alternating learning strategy that alternately learns the task-aware cross-modal representations to match the image-text pairs between tasks better, further enhancing the ability of continual learning. We conduct extensive experiments under various popular image classification benchmarks to demonstrate that our approach achieves state-of-the-art performance. At the same time, systematic ablation studies and visualization analyses validate the effectiveness and rationality of our method. Our code for ALTA is available at \url{https://github.com/vijaylee/ALTA}.
PDF AbstractCode
Datasets
Results from the Paper
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Continual Learning | Cifar100 (10 tasks) | ALTA-ViTB/16 | Average Accuracy | 92.85 | # 1 | |
Continual Learning | Cifar100 (10 tasks) | ALTA-RN50x4 | Average Accuracy | 84.91 | # 2 | |
Continual Learning | Cifar100 (10 tasks) | ALTA-RN101 | Average Accuracy | 84.77 | # 4 | |
Continual Learning | Cifar100 (10 tasks) | ALTA-RN50 | Average Accuracy | 83.87 | # 5 | |
Continual Learning | Tiny-ImageNet (10tasks) | ALTA-ViTB/16 | Average Accuracy | 89.80 | # 1 | |
Continual Learning | Tiny-ImageNet (10tasks) | ALTA-RN50x4 | Average Accuracy | 84.73 | # 2 | |
Continual Learning | Tiny-ImageNet (10tasks) | ALTA-RN101 | Average Accuracy | 83.35 | # 3 | |
Continual Learning | Tiny-ImageNet (10tasks) | ALTA-RN50 | Average Accuracy | 81.07 | # 4 |