Read-only Prompt Optimization for Vision-Language Few-shot Learning

In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization while displaying better robustness. Also, the proposed method achieves better generalization on extremely data-deficient settings, while improving parameter efficiency and computational overhead. Code is available at

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Prompt Engineering Caltech-101 RPO Harmonic mean 96.03 # 7
Prompt Engineering DTD RPO Harmonic mean 68.61 # 7
Prompt Engineering EuroSAT RPO Harmonic mean 76.79 # 9
Prompt Engineering FGVC-Aircraft RPO Harmonic mean 35.70 # 9
Prompt Engineering Food-101 RPO Harmonic mean 90.58 # 10
Prompt Engineering <h2>oi</h2> RPO Harmonic mean 74.00 # 9
Prompt Engineering Oxford 102 Flower RPO Harmonic mean 84.50 # 8
Prompt Engineering Oxford-IIIT Pet Dataset RPO Harmonic mean 96.05 # 10
Prompt Engineering Stanford Cars RPO Harmonic mean 74.69 # 8
Prompt Engineering SUN397 RPO Harmonic mean 79.18 # 9
Prompt Engineering UCF101 RPO Harmonic mean 79.34 # 9
