Why are some files missing from the extracted data? #36
-
In doing a comparison with my own work, Proverbot9001, I noticed the number of proofs extracted from some projects by CoqGym is significantly less than the number my own infrastructure reported. Upon looking into it further, I found that some proof files seem to be missing. For instance, in the Similarly, I noticed Could someone explain why not all the proofs in the target projects are represented? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
I think there are two reasons. Firstly, CoqGym needs to extract all the environment information (this is very special compared to all other ATP research works in Coq enviroment) by serapi, however this step consumes a lot of time and it fails sometimes. Secondly, CoqGym only generate proof datas which have specific topology which is done by 'check_topology' . you can find all the details in extract_proof.py |
Beta Was this translation helpful? Give feedback.
-
@Desperatesonic's answer is correct. Extracting proofs from a file can fail for various reasons, e.g., a Coq command exceeds the time limit here. Line 160 in 37f93ce We also exclude certain proofs. Details are in extract_proof.py. For example, we exclude proofs with tactics whose effect goes beyond the first sub-goal (mentioned in the supplementary material of our paper). |
Beta Was this translation helpful? Give feedback.
-
Thanks for clarifying!
… On Jan 20, 2021, at 9:32 PM, Kaiyu Yang ***@***.***> wrote:
No, the proofs in CoqGym are still correct. It just means CoqGym does not contain all proofs in the original Coq projects.
We do not plan to try to include all proofs exhaustively. First, CoqGym is for training/evaluating machine learning models, and missing some proofs has very little impact on that purpose. Second, there are non-trivial technical challenges if we want to include more proofs.
Also, the comment you mentioned is actually a different problem. It basically says CoqGym could have included more Coq projects (rather than proofs in existing projects).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#36 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOE6LXG7PGA6JQWPHNYWK3S26N4JANCNFSM4WESTHRA>.
|
Beta Was this translation helpful? Give feedback.
I think there are two reasons. Firstly, CoqGym needs to extract all the environment information (this is very special compared to all other ATP research works in Coq enviroment) by serapi, however this step consumes a lot of time and it fails sometimes. Secondly, CoqGym only generate proof datas which have specific topology which is done by 'check_topology' . you can find all the details in extract_proof.py