Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative number in the contengy table count #39

Open
Nolife999 opened this issue Mar 1, 2021 · 8 comments
Open

Negative number in the contengy table count #39

Nolife999 opened this issue Mar 1, 2021 · 8 comments
Labels
bug Something isn't working INSEE reuse case

Comments

@Nolife999
Copy link
Collaborator

The files are quite large so I will have to give them the test case by another way than github.

Before that, I was thinking it is maybe just a usage of blocking variable problem. Could you tell me if the files have to be ordered on the blocking key before processing with RELAIS ?

@Nolife999 Nolife999 added bug Something isn't working INSEE reuse case labels Mar 1, 2021
@luvalent
Copy link
Collaborator

luvalent commented Mar 3, 2021

No it is not required

@luvalent
Copy link
Collaborator

luvalent commented Mar 3, 2021

what do you mean with 'Negative number in the contengy table count'?

@Nolife999
Copy link
Collaborator Author

I will give you an example Luca; juste big files

@Nolife999
Copy link
Collaborator Author

SIRET_APP FREQUENCY
0 -1905011591
1 232457

1 REDUCTION METHOD {"REDUCTION-METHOD":"BlockingVariables","BLOCKING":{"BLOCKING A":["BDD_SIR_7_TYPEQU"],"BLOCKING B":["SRC_SIR_7_TYPEQU"]},"SORTED NEIGHBORHOOD":{"SORTING A":[],"SORTING B":[]},"SIMHASH":{"SHINGLING A":[],"SHINGLING B":[],"HDTHRESHOLD":"30","ROTATIONS":" 4"}}
2 MATCHING VARIABLES
MatchingVariable: SIRET_APP MatchingVariableA: BDD_SIR_7_SIRET MatchingVariableB: SRC_SIR_7_SIRET Method: Equality

@luvalent
Copy link
Collaborator

luvalent commented Mar 5, 2021 via email

@Nolife999
Copy link
Collaborator Author

Hi Luca, I don't forget you. Just lot of work these days but I could answer you beginning of the next week.
Manu

@luvalent
Copy link
Collaborator

ok manu, no problem :-)

@francescoamato
Copy link
Contributor

Hi Manu,
probably, if you have a large dataset, the negative fequency is a overflow problem.
I changed the frequency type in Long.
Can you try...

Francesco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working INSEE reuse case
Projects
None yet
Development

No branches or pull requests

3 participants