The dermatology differential diagnoses (ddx) dataset for skin condition classification includes expert annotations and model predictions for 1947 cases. Note that no images or meta information are provided. The expert annotations come in the form of differential diagnoses, i.e., partial rankings of conditions, and there is a high level of disagreement among experts, making this a perfect benchmark for dealing with disagreement. The data has been introduced in [1] and [2].
[1] Stutz, D., Roy, A.G., Matejovicova, T., Strachan, P., Cemgil, A.T.,
& Doucet, A. (2023).
[Conformal prediction under ambiguous ground truth](https://openreview.net/forum?id=CAd6V2qXxc).
TMLR.
[2] Stutz, D., Cemgil, A.T., Roy, A.G., Matejovicova, T., Barsbey, M., Strachan,
P., Schaekermann, M., Freyberg, J.V., Rikhye, R.V., Freeman, B., Matos, J.P.,
Telang, U., Webster, D.R., Liu, Y., Corrado, G.S., Matias, Y., Kohli, P.,
Liu, Y., Doucet, A., & Karthikesalingam, A. (2023).
[Evaluating AI systems under uncertain ground truth: a case study in dermatology](https://arxiv.org/abs/2307.02191).
ArXiv, abs/2307.02191.
The dataset is structured as follows:
data/dermatology_selectors.json
: The expert annotations as partial
rankings. These partial rankings are encoded as so-called "selectors":
For each case, there are multiple partial rankings, each partial
ranking is a list of grouped classes (i.e., skin conditions).
Teh example from [1, 2] shown below describes a partial ranking where
"Hemangioma" is ranked first followed by a group of three conditions,
including "Melanocytic Nevus", "Melanoma", and "O/E". In the JSON file,
the conditions are encoded as numbers and the mapping of numbers to
condition names can be found in data/dermatology_conditions.txt
.['Hemangioma'], ['Melanocytic Nevus', 'Melanoma', 'O/E']
data/dermatology_predictions[0-4].json
: Model predictions of models
A to D in [1] as 1947 x 419
float arrays saved using numpy.savetxt
with
fmt='%.3e'
.data/dermatology_conditions.txt
: Condition names for each class.data/dermatology_risks.txt
: Risk category for each condition, where
0 corresponds to low risk, 1 to medium risk and 2 to high risk.Paper | Code | Results | Date | Stars |
---|