RuWorldTree is a QA dataset with multiple-choice elementary-level science questions, which evaluate the understanding of core science facts.
Motivation
The WorldTree dataset starts the triad of the Reasoning and Knowledge tasks. The data includes the corpus of factoid utterances of various kinds, complex factoid questions and a corresponding causal chain of facts from the corpus resulting in a correct answer.
The WorldTree design was originally proposed in (Jansen et al., 2018).
An example in English for illustration purposes:
```{ 'question': 'A bottle of water is placed in the freezer. What property of water will change when the water reaches the freezing point? (A) color (B) mass (C) state of matter (D) weight',
'answer': 'C',
'exam_name': 'MEA',
'school_grade': 5,
'knowledge_type': 'NO TYPE',
'perturbation': 'ru_worldtree',
'episode': [18, 10, 11]
}```
Data Fields
Data Splits
The dataset consists of a training set with labeled examples and a test set in two configurations:
We use the same splits of data as in the original English version.
Test Perturbations
Each training episode in the dataset corresponds to seven test variations, including the original test data and six adversarial test sets, acquired through the modification of the original test through the following text perturbations:
Paper | Code | Results | Date | Stars |
---|