Edge labels not working for custom dataset #109

j-adamczyk · 2024-05-14T11:04:05Z

Describe the bug
I'm trying to create a custom dataset for Grakel:

def smiles_to_grakel_graphs(smiles_list: list[str]) -> list[grakel.Graph]:
    """
    Transforms list of SMILES strings into list of graphs in GraKeL library format.

    We use atomic numbers as discrete node labels.
    """
    mols = [MolFromSmiles(smiles) for smiles in smiles_list]
    graphs = []

    bond_type_to_int = {
        "SINGLE": 1,
        "DOUBLE": 2,
        "TRIPLE": 3,
        "AROMATIC": 4,
    }

    for mol in mols:
        graph = nx.Graph()

        for atom in mol.GetAtoms():
            graph.add_node(atom.GetIdx(), atom_label=atom.GetAtomicNum())

        for bond in mol.GetBonds():
            # default = OTHER
            bond_type = bond_type_to_int.get(str(bond.GetBondType()), 5)
            graph.add_edge(
                bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond_label=bond_type
            )

        graphs.append(graph)

    graphs = list(
        graph_from_networkx(
            graphs, as_Graph=True, node_labels_tag="atom_label", edge_labels_tag="bond_label"
        )
    )
    return graphs

This should result in graphs with edge labels. However, later in cross-validation, I get:

/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/graph.py:314: UserWarning: changing format from "adjacency" to "all"
  warnings.warn('changing format from "adjacency" to "all"')
Traceback (most recent call last):
  File "/home/jakub/PycharmProjects/pesticide_bee_toxicity_prediction/src/graph_kernels.py", line 155, in <module>
    train_graph_kernel_SVM(
  File "/home/jakub/PycharmProjects/pesticide_bee_toxicity_prediction/src/graph_kernels.py", line 128, in train_graph_kernel_SVM
    model.fit(graphs_train, y_train)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 970, in fit
    self._run_search(evaluate_candidates)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 1527, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_search.py", line 947, in evaluate_candidates
    _warn_or_raise_about_fit_failures(out, self.error_score)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 536, in _warn_or_raise_about_fit_failures
    raise ValueError(all_fits_failed_message)
ValueError: 
All the 25 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
25 fits failed with the following error:
Traceback (most recent call last):
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/model_selection/_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 471, in fit
    Xt = self._fit(X, y, routed_params)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 408, in _fit
    X, fitted_transformer = fit_transform_one_cached(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/joblib/memory.py", line 312, in __call__
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/pipeline.py", line 1303, in _fit_transform_one
    res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/sklearn/utils/_set_output.py", line 295, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py", line 308, in fit_transform
    self.fit(X)
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/kernel.py", line 124, in fit
    self.X = self.parse_input(X)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/kernels/neighborhood_subgraph_pairwise_distance.py", line 138, in parse_input
    x.get_labels(purpose="adjacency", label_type="edge"))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jakub/.cache/pypoetry/virtualenvs/pesticide-bee-toxicity-prediction-Sj4YDJPR-py3.11/lib/python3.11/site-packages/grakel/graph.py", line 750, in get_labels
    raise ValueError('Graph does not have any labels for edges.')
ValueError: Graph does not have any labels for edges.

My pipeline is:

kernel = NeighborhoodSubgraphPairwiseDistance(normalize=True)
svm = SVC(
    kernel="precomputed",
    probability=True,
    class_weight="balanced",
    cache_size=1024,
    random_state=0,
)
params_grid = {"svm__C": [1e-2, 1e-1, 1, 1e1, 1e2]}
pipeline = Pipeline([("kernel", kernel), ("svm", svm)])
grid_search = GridSearchCV(
    estimator=pipeline,
    param_grid=params_grid,
    scoring="roc_auc",
    cv=5,
    n_jobs=1,
)

graphs_train = smiles_to_grakel_graphs(smiles_train)
graphs_test = smiles_to_grakel_graphs(smiles_test)

model.fit(graphs_train, y_train)

EDIT: interestingly, labels initially seem to be there - print(graphs_train[0].edge_labels) results in {(0, 1): 2, (1, 0): 2, (1, 2): 1, (1, 3): 1, (2, 1): 1, (3, 1): 1, (3, 4): 2, (3, 5): 1, (4, 3): 2, (5, 3): 1}. I also tried using this without pipeline, just computing the kernel, but I get the same error.

The text was updated successfully, but these errors were encountered:

j-adamczyk · 2024-05-14T11:56:01Z

It turns out that I had single-atom molecules in my dataset, and that was the reason for the error. However, maybe it could be made more descriptive? Also, no labels + no edges is a completely correct input in many cases, so I think it should be handled properly.

ysig assigned giannisnik Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edge labels not working for custom dataset #109

Edge labels not working for custom dataset #109

j-adamczyk commented May 14, 2024 •

edited

Loading

j-adamczyk commented May 14, 2024

Edge labels not working for custom dataset #109

Edge labels not working for custom dataset #109

Comments

j-adamczyk commented May 14, 2024 • edited Loading

j-adamczyk commented May 14, 2024

j-adamczyk commented May 14, 2024 •

edited

Loading