Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Error in Benchmark Data – Incorrect Answer for Question ID 6701cda0bb02136c067cb6eb #108

Open
Wangmerlyn opened this issue Feb 5, 2025 · 2 comments

Comments

@Wangmerlyn
Copy link

Wangmerlyn commented Feb 5, 2025

Hello,

I believe there may be an error in the benchmark data for the question with _id: 6701cda0bb02136c067cb6eb. The question is as follows:

Question: Why did Kamala Harris push for a second debate with Donald Trump, and what reasons did Trump give for rejecting the invitation?
Choice A: Harris wanted to improve her polling numbers, while Trump was afraid that a second debate would not make him in an advantage position.
Choice B: Harris believed the first debate was too short, while Trump thought it's too late now.
Choice C: Harris wanted to improve her polling numbers, while Trump claimed early voting had already started.
Choice D: Harris wanted to improve her polling numbers, while Trump was concerned about scheduling conflicts with Elon Musk.

The dataset marks Choice A as the correct answer. However, based on the provided context, while it does mention that Harris wanted to improve her polling numbers, there is no direct evidence that Trump was afraid that a second debate would not make him in an advantage position. Instead, there is relevant supporting information from https://www.politico.com/news/2024/08/18/harris-trump-polls-dnc-00174532 in the context given, which states:

"Trump rejects second TV debate as 'too late'"

This suggests that Choice C might be a more accurate answer or that the dataset requires revision. Could you please verify this and clarify whether this question and its correct answer need an update?

Thank you for your time and effort in maintaining this benchmark!

@bys0318
Copy link
Member

bys0318 commented Feb 13, 2025

Thanks for pointing it out! We will soon have our annotator check the data and update the dataset.

@Wangmerlyn
Copy link
Author

Thanks for pointing it out! We will soon have our annotator check the data and update the dataset.

Thank you for your prompt response and for looking into the issue! I truly appreciate your dedication. If possible, could you kindly have the team review the other test data in the dataset as well? I’m not sure if there may be other potential errors, and a thorough check would be really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants