Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampled data with foreign key that points to more than one primary key results in InvalidDataError #2372

Open
frances-h opened this issue Feb 14, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@frances-h
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version:
  • Python version:
  • Operating System:

Error Description

Given metadata where a foreign key in a child table points to two primary keys in distinct tables, the resulting sampled data fails metadata validation. This should be considered invalid metadata (a foreign key should not be able to point to two separate primary keys), but I can successfully fit and and sample. However, when validating the sampled data with the metadata, I get an InvalidDataError because the sampled child contains entirely unknown references to one of the parents.

Suggested fix

To prevent this issue, we should update metadata validation to disallow multiple relationships between a single foreign key and multiple primary keys.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.metadata import Metadata
from sdv.multi_table import HSASynthesizer


data, _ = download_demo('multi_table', 'Student_loan_v1')
metadata = Metadata.detect_from_dataframes(data)
# The column 'name' in the table 'enlist' now points to 3 primary keys across different tables
metadata.validate_data(data)

synthesizer = HSASynthesizer(metadata)
synthesizer.fit(data)
sampled = synthesizer.sample()

metadata.validate_data(sampled)
@frances-h frances-h added the bug Something isn't working label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant