Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about performance improvement over reported metrics when strictly adhering to the Pre-Train setting #10

Open
hou27 opened this issue Dec 17, 2024 · 0 comments

Comments

@hou27
Copy link

hou27 commented Dec 17, 2024

Hello, thank you for sharing your NP-FKGC work and code.

While experimenting with the provided settings, I encountered some confusion regarding the configuration. In the paper and README, you mention different data settings such as "Pre-Train" and "In-Train." However, I noticed that even when using the --data_form Pre-Train option, the code still loads files with the _in_train suffix (e.g., train_tasks_in_train.json). As a result, it appears that the background KG is included in the training tasks.

To verify this, I replaced all _in_train files with their original counterparts (e.g., using train_tasks.json instead of train_tasks_in_train.json), thus creating a "pure Pre-Train" setting without including the background KG in training. Surprisingly, under this strict Pre-Train setup, the performance I achieved was even higher than the metrics reported in the paper. I initially expected that incorporating the background KG (In-Train) would boost performance, so obtaining better results without the background KG was unexpected.

I would appreciate clarification on the following points:

  1. Why does the code load _in_train files even when --data_form Pre-Train is specified? Is this the intended behavior, or is it a discrepancy that arose during code updates?

  2. When adhering strictly to the Pre-Train setting as described in the paper (i.e., excluding the background KG), is it normal to observe better performance than the reported metrics, or is this an unforeseen outcome?

Here are the experimental results obtained by training exclusively on the data without the _in_train suffix, which I verified directly using your code.
image

So far, everything I have verified pertains to the NELL dataset.

Thank you in advance for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant