Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird K2 implementation #1

Open
ThomasHoppe opened this issue Mar 17, 2022 · 2 comments
Open

Weird K2 implementation #1

ThomasHoppe opened this issue Mar 17, 2022 · 2 comments

Comments

@ThomasHoppe
Copy link

ThomasHoppe commented Mar 17, 2022

Hmmh, actually the implementation of the K2 function is a bit weird. Besides the data it takes a dictionary as input. In the supplied example the dictionary already contains the tree ! Thus, if in the for-loop of the K2 function for each node only the supplied parents of the node (tree_xi) are taken as input for the further computation, only the supplied tree can be reproduced.

The original paper of Cooper & Herskovitz says:
"Input: A set of n nodes, an ordering on the nodes, an upper bound u on the number of parents a node may have, and a database D containing m cases."

Thus, instead of a dictionary containing the ordering, an ordered set resp. a list needs to be used in order to avoid to inform the algorithm about the structure of the tree. Of course, then the upper bound on the number of parents needs to be supplied to.

@ruteee
Copy link
Owner

ruteee commented Mar 21, 2022

As you quoted we must provide an ordering of the nodes. The dictionary is, exactly, this ordering. Passing this ordering as a dictionary or a list does not have an influence on the result of the algorithm. It is just a swap of data structure.

If you pass the actual ordering of the nodes it is obvious the algorithm will understand that this is the true tree. However, you can pass any ordering on the nodes and the algorithm will function as well. This was just an example.

@ThomasHoppe
Copy link
Author

Hmmh, shall the fake_tree dictionary just represent a list? Or shall it represent a real tree?

Unfortunately, if you use for each node as option for the parents xi just ances from the tree_xi, the algorithm has no chance to determine whether a different - previously processed - node would give a better f_ch.

Another point is that: alpha and get_N are just limited to binary attributes ! This is hard coded as [0,1] in these functions. The computation of math.factorial( alfa[ 2*j + i]) seems to depend on this binary assumption, too.

It would be much helpful if you make these assumptions in the description of the repo more clear.

By the way, I am currently trying to get rid of the implicite assumption of binar attributes. But couldn't figure out yet, what the correct base instead of 2 needs to be in math.factorial( alfa[ 2*j + i])

alpha returns an index, right? How ist it computed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants