All data is manually annotated by three annotators. All three annotators annotated individually without any consultation with each other during the annotation process.
Each line consists following fields:
- topic: title of the debate
- Premise
- Conclusion
- Validity
- 1 means: conclusion is valid
- 0 means: conclusion is defeasibly valid (this is not an own class itself - you are free to exclude such samples)
- -1 means: conclusion is not valid
- Validity-Confidence
- "very confident": all three annotators agree in the validity-judgement
- "confident": two out of three annotators agree in the validity-judgment while one annotator abstains
- "majority": two out of three annotators agree in the validity-judgement, one disagrees
- "defeasible": there is no majority in votes - either all three annotators can't decide a validity rating or one annotator can't decide and the two others disagree each other (probably a very subjective sample)
- Novelty
- 1 means: conclusion is novel
- 0 means: conclusion is somewhat novel/ borderline case (this is not an own class itself - you are free to exclude such samples)
- -1 means: conclusion is not novel
- Novelty-Confidence
- same as validity-Confidence, only for novelty
- the confidence of a validity/ novelty judgment is just additional information - you are free to consider it or not. You do have not to predict the confidence
- the test set doesn't contain any defeasible validity/novelty instances hence there are no "0"-ratings, just "1" and "-1" for validity and novelty, respectively
Each line consists following fields:
- topic: title of the debate
- Premise
- Conclusion 1
- Conclusion 2
- Validity
- 1 means: Conclusion 2 is more valid
- 0 means: both conclusions are equally valid
- -1 means: Conclusion 1 is more valid
- Votes_Concl1IsMoreValid: number of votes voting for Conclusion 1 in terms of validity
- Votes_ValidTie: numbers of votes voting for a tie in terms of validity
- Votes_Concl2IsMoreValid: number of votes voting for Conclusion 2 in terms of validity
- Novelity
- 1 means: Conclusion 2 is more novel
- 0 means: both conclusions are equally novel
- -1 means: Conclusion 1 is more novel
- Votes_Concl1IsMoreNovel: number of votes voting for Conclusion 1 in terms of novelty
- Votes_NovelTie: numbers of votes voting for a tie in terms of novelty
- Votes_Concl2IsMoreNovel: number of votes voting for Conclusion 2 in terms of novelty
- the vote distribution is just additional information - you are free to consider it or not. You do have not to predict the distribution
- the test set also contains ties (in terms of validity/ novelty). Hence, there are three predictable classes per validity and novelty (worse, tie, better)
- this is a symmetric task. This means that if Conclusion A is better in Y than Conclusion B, B is also worse in Y than Conclusion A (ties are analogical). However, we include only one direction in the provided datasets, not both
- you're free to extend the provided datasets, for example with additional samples from Subtask A (we do not provide each conclusion combination in the datasets in this subtask). Please notice three things if you do that:
- if you extend the datasets, please provide this information later on in the submission form of your results
- please beware: if two Conclusions received the same validity and/or novelty judgment (of course, given the same premise), you can not infer that both conclusions are equally valid and/or novel. Still, one conclusion can be (but have not to be) better in terms of validity and/ or novelty
- please do not mix the different splits. If you want to extend the training split, you're not allowed to use the dev- or test-split from subtask A. If you want to extend the dev-split, you're not allowed to use the train- or test-split from subtask A