1 code implementation • 31 May 2023 • Rita Ramos, Bruno Martins, Desmond Elliott
Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process.
1 code implementation • 16 Feb 2023 • Rita Ramos, Desmond Elliott, Bruno Martins
The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions.
1 code implementation • CVPR 2023 • Rita Ramos, Bruno Martins, Desmond Elliott, Yova Kementchedjhieva
Recent advances in image captioning have focused on scaling the data and model size, substantially increasing the cost of pre-training and finetuning.