Learning Generalized Zero-Shot Learners for Open-Domain Image Geolocalization

1 Feb 2023  ยท  Lukas Haas, Silas Alberti, Michal Skreta ยท

Image geolocalization is the challenging task of predicting the geographic coordinates of origin for a given photo. It is an unsolved problem relying on the ability to combine visual clues with general knowledge about the world to make accurate predictions across geographies. We present $\href{https://huggingface.co/geolocal/StreetCLIP}{\text{StreetCLIP}}$, a robust, publicly available foundation model not only achieving state-of-the-art performance on multiple open-domain image geolocalization benchmarks but also doing so in a zero-shot setting, outperforming supervised models trained on more than 4 million images. Our method introduces a meta-learning approach for generalized zero-shot learning by pretraining CLIP from synthetic captions, grounding CLIP in a domain of choice. We show that our method effectively transfers CLIP's generalized zero-shot capabilities to the domain of image geolocalization, improving in-domain generalized zero-shot performance without finetuning StreetCLIP on a fixed set of classes.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


 Ranked #1 on Photo geolocation estimation on Im2GPS (Training images metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Photo geolocation estimation Im2GPS StreetCLIP (Zero-Shot) City level (25 km) 28.3 # 8
Region level (200 km) 45.1 # 7
Country level (750 km) 74.7 # 2
Continent level (2500 km) 88.2 # 2
Training images 1.1M # 1
Reference images 0 # 1
Photo geolocation estimation Im2GPS3k StreetCLIP (Zero-Shot) Street level (1 km) - # 12
City level (25 km) 22.4 # 9
Region level (200 km) 37.4 # 5
Country level (750 km) 61.3 # 3
Continent level (2500 km) 80.4 # 3
Training Images 1.1M # 12

Methods