Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge labels not working for custom dataset #109

Open
j-adamczyk opened this issue May 14, 2024 · 1 comment
Open

Edge labels not working for custom dataset #109

j-adamczyk opened this issue May 14, 2024 · 1 comment
Assignees

Comments

@j-adamczyk
Copy link

j-adamczyk commented May 14, 2024

Describe the bug
I'm trying to create a custom dataset for Grakel:

def smiles_to_grakel_graphs(smiles_list: list[str]) -> list[grakel.Graph]:
    """
    Transforms list of SMILES strings into list of graphs in GraKeL library format.

    We use atomic numbers as discrete node labels.
    """
    mols = [MolFromSmiles(smiles) for smiles in smiles_list]
    graphs = []

    bond_type_to_int = {
        "SINGLE": 1,
        "DOUBLE": 2,
        "TRIPLE": 3,
        "AROMATIC": 4,
    }

    for mol in mols:
        graph = nx.Graph()

        for atom in mol.GetAtoms():
            graph.add_node(atom.GetIdx(), atom_label=atom.GetAtomicNum())

        for bond in mol.GetBonds():
            # default = OTHER
            bond_type = bond_type_to_int.get(str(bond.GetBondType()), 5)
            graph.add_edge(
                bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond_label=bond_type
            )

        graphs.append(graph)

    graphs = list(
        graph_from_networkx(
            graphs, as_Graph=True, node_labels_tag="atom_label", edge_labels_tag="bond_label"
        )
    )
    return graphs

This should result in graphs with edge labels. However, later in cross-validation, I get:

/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/graph.py:314: UserWarning: changing format from "adjacency" to "all"
  warnings.warn('changing format from "adjacency" to "all"')
Traceback (most recent call last):
  File "/home/jakub/PycharmProjects/pesticide_bee_toxicity_prediction/src/graph_kernels.py", line 155, in <module>
    train_graph_kernel_SVM(
  File "/home/jakub/PycharmProjects/pesticide_bee_toxicity_prediction/src/graph_kernels.py", line 128, in train_graph_kernel_SVM
    model.fit(graphs_train, y_train)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 970, in fit
    self._run_search(evaluate_candidates)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 1527, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 947, in evaluate_candidates
    _warn_or_raise_about_fit_failures(out, self.error_score)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 536, in _warn_or_raise_about_fit_failures
    raise ValueError(all_fits_failed_message)
ValueError: 
All the 25 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
25 fits failed with the following error:
Traceback (most recent call last):
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 471, in fit
    Xt = self._fit(X, y, routed_params)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 408, in _fit
    X, fitted_transformer = fit_transform_one_cached(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/joblib/memory.py", line 312, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 1303, in _fit_transform_one
    res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py", line 308, in fit_transform
    self.fit(X)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/kernel.py", line 124, in fit
    self.X = self.parse_input(X)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py", line 138, in parse_input
    x.get_labels(purpose="adjacency", label_type="edge"))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/graph.py", line 750, in get_labels
    raise ValueError('Graph does not have any labels for edges.')
ValueError: Graph does not have any labels for edges.

My pipeline is:

kernel = NeighborhoodSubgraphPairwiseDistance(normalize=True)
svm = SVC(
    kernel="precomputed",
    probability=True,
    class_weight="balanced",
    cache_size=1024,
    random_state=0,
)
params_grid = {"svm__C": [1e-2, 1e-1, 1, 1e1, 1e2]}
pipeline = Pipeline([("kernel", kernel), ("svm", svm)])
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=params_grid,
    scoring="roc_auc",
    cv=5,
    n_jobs=1,
)

graphs_train = smiles_to_grakel_graphs(smiles_train)
graphs_test = smiles_to_grakel_graphs(smiles_test)

model.fit(graphs_train, y_train)

EDIT: interestingly, labels initially seem to be there - print(graphs_train[0].edge_labels) results in {(0, 1): 2, (1, 0): 2, (1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1, (3, 4): 2, (3, 5): 1, (4, 3): 2, (5, 3): 1}. I also tried using this without pipeline, just computing the kernel, but I get the same error.

@j-adamczyk
Copy link
Author

It turns out that I had single-atom molecules in my dataset, and that was the reason for the error. However, maybe it could be made more descriptive? Also, no labels + no edges is a completely correct input in many cases, so I think it should be handled properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants