Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630

eroell · 2023-12-18T16:37:49Z

PR Checklist

This comment contains a description of changes (with reason)
Referenced issue is linked
If you've fixed a bug or added code that should be tested, add tests!
Documentation in docs is updated

Description of changes
Resolves #620. Uses .var['ehrapy_column_type'] instead of .uns for numeric, non_numeric, and non_numeric_encoded type identification of variables.

Technical details
Instead of .uns['numerical_columns'], .uns['non_numerical_columns'], .uns['non_numerical_encoded_columns'], one column in .var containing values numeric, non_numeric, or non_numeric_encoded is used.

Reading in or transferring csv files, and a wide variety of users interacting with the AnnData object and ehrapy is not affected. However, backwards compatibility is not strictly maintained with this update. E.g. custom modifications to the .uns['numerical_columns'] etc will break with this update.

Additional context
Putting this information to the variable level in .var allows for e.g. slicing, and reduces overhead of keeping .uns in sync with the variables when selecting/moving variables.

Old example of creating a dummy dataset:

import ehrapy as ep
import numpy as np
import pandas as pd
import scanpy as sc

def create_dummy_dataset_numerical_in_obs():
    """
    Create a dummy dataset with numerical and non-numerical variables in obs.
    Also, has numerical variables in .X.
    """
    dummy_obs = {"disease":['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                "station": ['ICU', 'ICU', 'MICU', 'MICU', 'ICU', 'ICU', 'MICU', 'MICU'],
                "syst_bp_entry": [138, 139, 140, 141, 148, 149, 150, 151],
                "diast_bp_entry": [78, 79, 80, 81, 77, 78, 79, 80]}

    dummy_var = pd.DataFrame({"Unit": ["mg/dl", "kg"],
                              "ehrapy_column_type": ["numerical", "numerical"]})
    dummy_var.index = ['glucose', "weight"]
    dummy_X = np.array([[80, 90, 120, 130, 80, 130, 120, 90],
                        [77, 76, 60, 90, 110, 78, 56, 76]]).T

    adata_dummy = sc.AnnData(X=dummy_X, obs=dummy_obs, var=dummy_var)
    
    adata_dummy.uns['numerical_columns'] = ['glucose', 'weight']
    adata_dummy.uns['non_numerical_columns'] = []
    adata_dummy.uns['encoded_non_numerical_columns'] = []
    
    return adata_dummy

adata_dummy = create_dummy_dataset_numerical_in_obs()

New example of creating a dummy dataset:

import ehrapy as ep
import numpy as np
import pandas as pd
import scanpy as sc

def create_dummy_dataset_numerical_in_obs():
    """
    Create a dummy dataset with numerical and non-numerical variables in obs.
    Also, has numerical variables in .X.
    """
    dummy_obs = {"disease":['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                "station": ['ICU', 'ICU', 'MICU', 'MICU', 'ICU', 'ICU', 'MICU', 'MICU'],
                "syst_bp_entry": [138, 139, 140, 141, 148, 149, 150, 151],
                "diast_bp_entry": [78, 79, 80, 81, 77, 78, 79, 80]}

    dummy_var = pd.DataFrame({"Unit": ["mg/dl", "kg"]})
    dummy_var.index = ['glucose', "weight"]
    dummy_X = np.array([[80, 90, 120, 130, 80, 130, 120, 90],
                        [77, 76, 60, 90, 110, 78, 56, 76]]).T

    adata_dummy = sc.AnnData(X=dummy_X, obs=dummy_obs, var=dummy_var)
    
    return adata_dummy

adata_dummy = create_dummy_dataset_numerical_in_obs()

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Zethson

This is great!

We're not using the chatGPT/vscode/something autogenerated types in the docstrings because we only want to use type hints in the function header. Else, it's 2x the maintenance!

ehrapy/anndata/anndata_ext.py

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

No type annotation in docstrings Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

ehrapy/anndata/anndata_ext.py

tests/preprocessing/test_normalization.py

eroell and others added 22 commits December 6, 2023 10:11

tests for rank features groups with obs

0c52f13

first drafted feature ranking using obs

dd022ec

fixed encoding names

a818728

Merge branch 'main' into rank-features-groups-obs

0790a80

remove comment

adaa53b

Remove comment

37c6a9a

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Remove comment

06c2a00

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Remove comment

6fa3de1

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Update ehrapy/tools/feature_ranking/_rank_features_groups.py

458520e

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Remove comment

210bac6

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Remove comment

ffadf71

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Update ehrapy/tools/feature_ranking/_rank_features_groups.py

f0f4867

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Iterable to list and import from future

02148f5

no expensive copy

a6f5606

upated to use layer, obs, or both

bf02fff

this test data should be more stable

213d8d5

Update ehrapy/tools/feature_ranking/_rank_features_groups.py

034f820

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

correct indent of previous commit and added comment on dummy X

6458265

bug fixes, more tests and (fixed) examples in docstring

f444a59

corrected for tests and modified encode

8b3fcce

remove need for fields in .uns, updated use in var

e729baf

merge

32313e1

Zethson reviewed Dec 18, 2023

View reviewed changes

eroell and others added 5 commits December 18, 2023 22:59

forgot to commit ep.anndata._constants.py

9c24a11

Update ehrapy/anndata/anndata_ext.py

f9e598f

Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

Update ehrapy/anndata/anndata_ext.py

9a7c9b2

No type annotation in docstrings Co-authored-by: Lukas Heumos <lukas.heumos@posteo.net>

remove usage of .uns, rewrite tests

3bd7cba

remove type in docstring

1e1d16f

Zethson reviewed Dec 19, 2023

View reviewed changes

ehrapy/anndata/anndata_ext.py Outdated Show resolved Hide resolved

tests/preprocessing/test_normalization.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into rank-features-groups-obs

5fb7c21

remove commented code

436a6cc

eroell marked this pull request as ready for review December 19, 2023 08:36

Zethson merged commit 3cbeafd into theislab:main Dec 19, 2023
11 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630

Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630

eroell commented Dec 18, 2023 •

edited

Loading

Zethson left a comment

Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630

Move information on numerical/non_numerical/encoded_non_numerical from .uns to .var #630

Conversation

eroell commented Dec 18, 2023 • edited Loading

Zethson left a comment

Choose a reason for hiding this comment

eroell commented Dec 18, 2023 •

edited

Loading