diff --git a/404.html b/404.html index 2587775..41b3b28 100644 --- a/404.html +++ b/404.html @@ -42,6 +42,13 @@
Below, all the inputs, parameters (including their different options), and outputs are detailed. Source code of the functions is also included.
+cell2cell.analysis
+ analysis
@@ -330,7 +783,7 @@ cell2cell_pipelines
-
BulkInteractions
-Interaction class with all necessary methods to run the cell2cell pipeline
-on a bulk RNA-seq dataset. Cells here could be represented by tissues, samples +
Interaction class with all necessary methods to run the cell2cell pipeline +on a bulk RNA-seq dataset. Cells here could be represented by tissues, samples or any bulk organization of cells.
- -Parameters: | -
-
|
-
---|
Attributes:
-Name | -Type | -Description | -
---|---|---|
rnaseq_data |
- pandas.DataFrame |
- Gene expression data for a bulk RNA-seq experiment. Columns are samples -and rows are genes. |
-
metadata |
- pandas.DataFrame |
- Metadata associated with the samples in the RNA-seq dataset. |
-
index_col |
- str |
- Column-name for the samples in the metadata. |
-
group_col |
- str |
- Column-name for the grouping information associated with the samples -in the metadata. |
-
ppi_data |
- pandas.DataFrame |
- List of protein-protein interactions (or ligand-receptor pairs) used for -inferring the cell-cell interactions and communication. |
-
complex_sep |
- str |
- Symbol that separates the protein subunits in a multimeric complex. -For example, '&' is the complex_sep for a list of ligand-receptor pairs -where a protein partner could be "CD74&CD44". |
-
complex_agg_method |
- str |
- Method to aggregate the expression value of multiple genes in a -complex. -
|
-
ref_ppi |
- pandas.DataFrame |
- Reference list of protein-protein interactions (or ligand-receptor pairs) used -for inferring the cell-cell interactions and communication. It could be the -same as 'ppi_data' if ppi_data is not bidirectional (that is, contains -ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must -be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). |
-
interaction_columns |
- tuple |
- Contains the names of the columns where to find the partners in a -dataframe of protein-protein interactions. If the list is for -ligand-receptor pairs, the first column is for the ligands and the second -for the receptors. |
-
analysis_setup |
- dict |
- Contains main setup for running the cell-cell interactions and communication -analyses. -Three main setups are needed (passed as keys): -
|
-
cutoff_setup |
- dict |
- Contains two keys: 'type' and 'parameter'. The first key represent the -way to use a cutoff or threshold, while parameter is the value used -to binarize the expression values. -The key 'type' can be: -
|
-
interaction_space |
- cell2cell.core.interaction_space.InteractionSpace |
- Interaction space that contains all the required elements to perform the -cell-cell interaction and communication analysis between every pair of cells. -After performing the analyses, the results are stored in this object. |
-
cci_score : str, default='bray_curtis' + Scoring function to aggregate the communication scores between a pair of + cells. It computes an overall potential of cell-cell interactions. + Options:
+- 'bray_curtis' : Bray-Curtis-like score.
+- 'jaccard' : Jaccard-like score.
+- 'count' : Number of LR pairs that the pair of cells use.
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
cci_type : str, default='undirected' + Specifies whether computing the cci_score in a directed or undirected + way. For a pair of cells A and B, directed means that the ligands are + considered only from cell A and receptors only from cell B or viceversa. + While undirected simultaneously considers signaling from cell A to + cell B and from cell B to cell A.
+sample_col : str, default='sampleID' + Column-name for the samples in the metadata.
+group_col : str, default='tissue' + Column-name for the grouping information associated with the samples + in the metadata.
+expression_threshold : float, default=10 + Threshold value to binarize gene expression when using + communication_score='expression_thresholding'. Units have to be the + same as the rnaseq_data matrix (e.g., TPMs, counts, etc.).
+complex_sep : str, default=None + Symbol that separates the protein subunits in a multimeric complex. + For example, '&' is the complex_sep for a list of ligand-receptor pairs + where a protein partner could be "CD74&CD44".
+complex_agg_method : str, default='min' + Method to aggregate the expression value of multiple genes in a + complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
verbose : boolean, default=False + Whether printing or not steps of the analysis.
+rnaseq_data : pandas.DataFrame + Gene expression data for a bulk RNA-seq experiment. Columns are samples + and rows are genes.
+metadata : pandas.DataFrame + Metadata associated with the samples in the RNA-seq dataset.
+index_col : str + Column-name for the samples in the metadata.
+group_col : str + Column-name for the grouping information associated with the samples + in the metadata.
+ppi_data : pandas.DataFrame + List of protein-protein interactions (or ligand-receptor pairs) used for + inferring the cell-cell interactions and communication.
+complex_sep : str + Symbol that separates the protein subunits in a multimeric complex. + For example, '&' is the complex_sep for a list of ligand-receptor pairs + where a protein partner could be "CD74&CD44".
+complex_agg_method : str + Method to aggregate the expression value of multiple genes in a + complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
ref_ppi : pandas.DataFrame + Reference list of protein-protein interactions (or ligand-receptor pairs) used + for inferring the cell-cell interactions and communication. It could be the + same as 'ppi_data' if ppi_data is not bidirectional (that is, contains + ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must + be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
+interaction_columns : tuple + Contains the names of the columns where to find the partners in a + dataframe of protein-protein interactions. If the list is for + ligand-receptor pairs, the first column is for the ligands and the second + for the receptors.
+analysis_setup : dict + Contains main setup for running the cell-cell interactions and communication + analyses. + Three main setups are needed (passed as keys):
+- 'communication_score' : is the type of communication score used to detect
+ active ligand-receptor pairs between each pair of cell.
+ It can be:
+
+ - 'expression_thresholding'
+ - 'expression_product'
+ - 'expression_mean'
+ - 'expression_gmean'
+
+- 'cci_score' : is the scoring function to aggregate the communication
+ scores.
+ It can be:
+
+ - 'bray_curtis'
+ - 'jaccard'
+ - 'count'
+ - 'icellnet'
+
+- 'cci_type' : is the type of interaction between two cells. If it is
+ undirected, all ligands and receptors are considered from both cells.
+ If it is directed, ligands from one cell and receptors from the other
+ are considered separately with respect to ligands from the second
+ cell and receptor from the first one.
+ So, it can be:
+
+ - 'undirected'
+ - 'directed'
+
cutoff_setup : dict + Contains two keys: 'type' and 'parameter'. The first key represent the + way to use a cutoff or threshold, while parameter is the value used + to binarize the expression values. + The key 'type' can be:
+ - 'local_percentile' : computes the value of a given percentile, for each
+ gene independently. In this case, the parameter corresponds to the
+ percentile to compute, as a float value between 0 and 1.
+ - 'global_percentile' : computes the value of a given percentile from all
+ genes and samples simultaneously. In this case, the parameter
+ corresponds to the percentile to compute, as a float value between
+ 0 and 1. All genes have the same cutoff.
+ - 'file' : load a cutoff table from a file. Parameter in this case is the
+ path of that file. It must contain the same genes as index and same
+ samples as columns.
+ - 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in each sample. This allows to use specific cutoffs for
+ each sample. The columns here must be the same as the ones in the
+ rnaseq_data.
+ - 'single_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in only one column. These cutoffs will be applied to
+ all samples.
+ - 'constant_value' : binarizes the expression. Evaluates whether
+ expression is greater than the value input in the parameter.
+
interaction_space : cell2cell.core.interaction_space.InteractionSpace + Interaction space that contains all the required elements to perform the + cell-cell interaction and communication analysis between every pair of cells. + After performing the analyses, the results are stored in this object.
+cell2cell/analysis/cell2cell_pipelines.py
class BulkInteractions:
@@ -831,131 +1230,131 @@ self.analysis_setup['communication_score'] = communication_score
self.analysis_setup['cci_score'] = cci_score
self.analysis_setup['cci_type'] = cci_type
-
- # Initialize PPI
- genes = list(rnaseq_data.index)
- ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
- proteins=genes,
- complex_sep=complex_sep,
- upper_letter_comparison=False,
- interaction_columns=self.interaction_columns)
-
- self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
- interaction_columns=self.interaction_columns,
- verbose=verbose)
- if self.analysis_setup['cci_type'] == 'undirected':
- self.ref_ppi = self.ppi_data.copy()
- self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
- interaction_columns=self.interaction_columns,
- verbose=verbose)
- else:
- self.ref_ppi = None
-
- # Thresholding
- self.cutoff_setup['type'] = 'constant_value'
- self.cutoff_setup['parameter'] = expression_threshold
-
- # Interaction Space
- self.interaction_space = initialize_interaction_space(rnaseq_data=self.rnaseq_data,
- ppi_data=self.ppi_data,
- cutoff_setup=self.cutoff_setup,
- analysis_setup=self.analysis_setup,
- complex_sep=complex_sep,
- complex_agg_method=complex_agg_method,
- interaction_columns=self.interaction_columns,
- verbose=verbose)
-
- def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
- '''Computes overall CCI scores for each pair of cells.
-
- Parameters
- ----------
- cci_score : str, default=None
- Scoring function to aggregate the communication scores between
- a pair of cells. It computes an overall potential of cell-cell
- interactions. If None, it will use the one stored in the
- attribute analysis_setup of this object.
- Options:
-
- - 'bray_curtis' : Bray-Curtis-like score.
- - 'jaccard' : Jaccard-like score.
- - 'count' : Number of LR pairs that the pair of cells use.
- - 'icellnet' : Sum of the L-R expression product of a pair of cells
-
- use_ppi_score : boolean, default=False
- Whether using a weight of LR pairs specified in the ppi_data
- to compute the scores.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
- '''
- self.interaction_space.compute_pairwise_cci_scores(cci_score=cci_score,
- use_ppi_score=use_ppi_score,
- verbose=verbose)
-
- def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
- interaction_columns=None, cells=None, cci_type=None, verbose=True):
- '''Computes the communication scores for each LR pairs in
- a given pair of sender-receiver cell
-
- Parameters
- ----------
- communication_score : str, default=None
- Type of communication score to infer the potential use of
- a given ligand-receptor pair by a pair of cells/tissues/samples.
- If None, the score stored in the attribute analysis_setup
- will be used.
- Available communication_scores are:
-
- - 'expresion_thresholding' : Computes the joint presence of a
- ligand from a sender cell and of
- a receptor on a receiver cell from
- binarizing their gene expression levels.
- - 'expression_mean' : Computes the average between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
- - 'expression_product' : Computes the product between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
- - 'expression_gmean' : Computes the geometric mean between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score : boolean, default=False
- Whether using a weight of LR pairs specified in the ppi_data
- to compute the scores.
-
- ref_ppi_data : pandas.DataFrame, default=None
- Reference list of protein-protein interactions (or
- ligand-receptor pairs) used for inferring the cell-cell
- interactions and communication. It could be the same as
- 'ppi_data' if ppi_data is not bidirectional (that is,
- contains ProtA-ProtB interaction as well as ProtB-ProtA
- interaction). ref_ppi must be undirected (contains only
- ProtA-ProtB and not ProtB-ProtA interaction). If None
- the one stored in the attribute ref_ppi will be used.
-
- interaction_columns : tuple, default=None
- Contains the names of the columns where to find the
- partners in a dataframe of protein-protein interactions.
- If the list is for ligand-receptor pairs, the first column
- is for the ligands and the second for the receptors. If
- None, the one stored in the attribute interaction_columns
- will be used.
-
- cells : list=None
- List of cells to consider.
-
- cci_type : str, default=None
- Type of interaction between two cells. Used to specify
- if we want to consider a LR pair in both directions.
- It can be:
-
- - 'undirected'
- - 'directed'
-
- If None, the one stored in the attribute analysis_setup
- will be used.
+ self.analysis_setup['ccc_type'] = cci_type
+
+ # Initialize PPI
+ genes = list(rnaseq_data.index)
+ ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
+ proteins=genes,
+ complex_sep=complex_sep,
+ upper_letter_comparison=False,
+ interaction_columns=self.interaction_columns)
+
+ self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+ if self.analysis_setup['cci_type'] == 'undirected':
+ self.ref_ppi = self.ppi_data.copy()
+ self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+ else:
+ self.ref_ppi = None
+
+ # Thresholding
+ self.cutoff_setup['type'] = 'constant_value'
+ self.cutoff_setup['parameter'] = expression_threshold
+
+ # Interaction Space
+ self.interaction_space = initialize_interaction_space(rnaseq_data=self.rnaseq_data,
+ ppi_data=self.ppi_data,
+ cutoff_setup=self.cutoff_setup,
+ analysis_setup=self.analysis_setup,
+ complex_sep=complex_sep,
+ complex_agg_method=complex_agg_method,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+
+ def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
+ '''Computes overall CCI scores for each pair of cells.
+
+ Parameters
+ ----------
+ cci_score : str, default=None
+ Scoring function to aggregate the communication scores between
+ a pair of cells. It computes an overall potential of cell-cell
+ interactions. If None, it will use the one stored in the
+ attribute analysis_setup of this object.
+ Options:
+
+ - 'bray_curtis' : Bray-Curtis-like score.
+ - 'jaccard' : Jaccard-like score.
+ - 'count' : Number of LR pairs that the pair of cells use.
+ - 'icellnet' : Sum of the L-R expression product of a pair of cells
+
+ use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+ '''
+ self.interaction_space.compute_pairwise_cci_scores(cci_score=cci_score,
+ use_ppi_score=use_ppi_score,
+ verbose=verbose)
+
+ def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
+ interaction_columns=None, cells=None, cci_type=None, verbose=True):
+ '''Computes the communication scores for each LR pairs in
+ a given pair of sender-receiver cell
+
+ Parameters
+ ----------
+ communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+
+ - 'expresion_thresholding' : Computes the joint presence of a
+ ligand from a sender cell and of
+ a receptor on a receiver cell from
+ binarizing their gene expression levels.
+ - 'expression_mean' : Computes the average between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+ - 'expression_product' : Computes the product between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+ - 'expression_gmean' : Computes the geometric mean between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+
+ use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+
+ ref_ppi_data : pandas.DataFrame, default=None
+ Reference list of protein-protein interactions (or
+ ligand-receptor pairs) used for inferring the cell-cell
+ interactions and communication. It could be the same as
+ 'ppi_data' if ppi_data is not bidirectional (that is,
+ contains ProtA-ProtB interaction as well as ProtB-ProtA
+ interaction). ref_ppi must be undirected (contains only
+ ProtA-ProtB and not ProtB-ProtA interaction). If None
+ the one stored in the attribute ref_ppi will be used.
+
+ interaction_columns : tuple, default=None
+ Contains the names of the columns where to find the
+ partners in a dataframe of protein-protein interactions.
+ If the list is for ligand-receptor pairs, the first column
+ is for the ligands and the second for the receptors. If
+ None, the one stored in the attribute interaction_columns
+ will be used.
+
+ cells : list=None
+ List of cells to consider.
+
+ cci_type : str, default=None
+ Type of interaction between two cells. Used to specify
+ if we want to consider a LR pair in both directions.
+ It can be:
+
+ - 'undirected'
+ - 'directed'
+
+ If None, 'directed' will be used.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
@@ -969,21 +1368,23 @@ if cci_type is None:
cci_type = 'directed'
- self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
- use_ppi_score=use_ppi_score,
- ref_ppi_data=ref_ppi_data,
- interaction_columns=interaction_columns,
- cells=cells,
- cci_type=cci_type,
- verbose=verbose)
-
- @property
- def interaction_elements(self):
- '''Returns the interaction elements within an interaction space.'''
- if hasattr(self.interaction_space, 'interaction_elements'):
- return self.interaction_space.interaction_elements
- else:
- return None
+ self.analysis_setup['ccc_type'] = cci_type
+
+ self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
+ use_ppi_score=use_ppi_score,
+ ref_ppi_data=ref_ppi_data,
+ interaction_columns=interaction_columns,
+ cells=cells,
+ cci_type=cci_type,
+ verbose=verbose)
+
+ @property
+ def interaction_elements(self):
+ '''Returns the interaction elements within an interaction space.'''
+ if hasattr(self.interaction_space, 'interaction_elements'):
+ return self.interaction_space.interaction_elements
+ else:
+ return None
interaction_elements
@@ -1008,7 +1409,7 @@ readonly
compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)
-Computes overall CCI scores for each pair of cells.
+cci_score : str, default=None + Scoring function to aggregate the communication scores between + a pair of cells. It computes an overall potential of cell-cell + interactions. If None, it will use the one stored in the + attribute analysis_setup of this object. + Options:
+- 'bray_curtis' : Bray-Curtis-like score.
+- 'jaccard' : Jaccard-like score.
+- 'count' : Number of LR pairs that the pair of cells use.
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
use_ppi_score : boolean, default=False + Whether using a weight of LR pairs specified in the ppi_data + to compute the scores.
+verbose : boolean, default=True + Whether printing or not steps of the analysis.
-Parameters: | -
-
|
-
---|
cell2cell/analysis/cell2cell_pipelines.py
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
@@ -1106,79 +1497,72 @@
+
compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=None, cells=None, cci_type=None, verbose=True)
-
+
- Computes the communication scores for each LR pairs in
-a given pair of sender-receiver cell
-
-
-
-
-
-
-
-
- Parameters:
-
-
- communication_score (str, default=None
) – Type of communication score to infer the potential use of
-a given ligand-receptor pair by a pair of cells/tissues/samples.
-If None, the score stored in the attribute analysis_setup
-will be used.
-Available communication_scores are:
-
-- 'expresion_thresholding' : Computes the joint presence of a
+
Computes the communication scores for each LR pairs in
+a given pair of sender-receiver cell
+Parameters
+communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
- binarizing their gene expression levels.
-- 'expression_mean' : Computes the average between the expression
+ binarizing their gene expression levels.
+- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_product' : Computes the product between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_gmean' : Computes the geometric mean between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- ref_ppi_data (pandas.DataFrame, default=None
) – Reference list of protein-protein interactions (or
-ligand-receptor pairs) used for inferring the cell-cell
-interactions and communication. It could be the same as
-'ppi_data' if ppi_data is not bidirectional (that is,
-contains ProtA-ProtB interaction as well as ProtB-ProtA
-interaction). ref_ppi must be undirected (contains only
-ProtA-ProtB and not ProtB-ProtA interaction). If None
-the one stored in the attribute ref_ppi will be used.
- interaction_columns (tuple, default=None
) – Contains the names of the columns where to find the
-partners in a dataframe of protein-protein interactions.
-If the list is for ligand-receptor pairs, the first column
-is for the ligands and the second for the receptors. If
-None, the one stored in the attribute interaction_columns
-will be used.
- cells (list=None
) – List of cells to consider.
- cci_type (str, default=None
) – Type of interaction between two cells. Used to specify
-if we want to consider a LR pair in both directions.
-It can be:
-
-- 'undirected'
-- 'directed'
-
-If None, the one stored in the attribute analysis_setup
-will be used.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
+ expression of a receptor on a receiver cell.
+
+
+
use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+ref_ppi_data : pandas.DataFrame, default=None
+ Reference list of protein-protein interactions (or
+ ligand-receptor pairs) used for inferring the cell-cell
+ interactions and communication. It could be the same as
+ 'ppi_data' if ppi_data is not bidirectional (that is,
+ contains ProtA-ProtB interaction as well as ProtB-ProtA
+ interaction). ref_ppi must be undirected (contains only
+ ProtA-ProtB and not ProtB-ProtA interaction). If None
+ the one stored in the attribute ref_ppi will be used.
+interaction_columns : tuple, default=None
+ Contains the names of the columns where to find the
+ partners in a dataframe of protein-protein interactions.
+ If the list is for ligand-receptor pairs, the first column
+ is for the ligands and the second for the receptors. If
+ None, the one stored in the attribute interaction_columns
+ will be used.
+cells : list=None
+ List of cells to consider.
+cci_type : str, default=None
+ Type of interaction between two cells. Used to specify
+ if we want to consider a LR pair in both directions.
+ It can be:
+- 'undirected'
+- 'directed'
+
+If None, 'directed' will be used.
+
+
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+
Source code in cell2cell/analysis/cell2cell_pipelines.py
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
@@ -1242,28 +1626,29 @@ - 'undirected'
- 'directed'
- If None, the one stored in the attribute analysis_setup
- will be used.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
- '''
- if interaction_columns is None:
- interaction_columns = self.interaction_columns # Used only for ref_ppi_data
-
- if ref_ppi_data is None:
- ref_ppi_data = self.ref_ppi
-
- if cci_type is None:
- cci_type = 'directed'
-
- self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
- use_ppi_score=use_ppi_score,
- ref_ppi_data=ref_ppi_data,
- interaction_columns=interaction_columns,
- cells=cells,
- cci_type=cci_type,
- verbose=verbose)
+ If None, 'directed' will be used.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+ '''
+ if interaction_columns is None:
+ interaction_columns = self.interaction_columns # Used only for ref_ppi_data
+
+ if ref_ppi_data is None:
+ ref_ppi_data = self.ref_ppi
+
+ if cci_type is None:
+ cci_type = 'directed'
+
+ self.analysis_setup['ccc_type'] = cci_type
+
+ self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
+ use_ppi_score=use_ppi_score,
+ ref_ppi_data=ref_ppi_data,
+ interaction_columns=interaction_columns,
+ cells=cells,
+ cci_type=cci_type,
+ verbose=verbose)
SingleCellInteractions
-Interaction class with all necessary methods to run the cell2cell pipeline
-on a single-cell RNA-seq dataset.
- -Parameters: | -
-
|
-
---|
Attributes:
-Name | -Type | -Description | -
---|---|---|
rnaseq_data |
- pandas.DataFrame or scanpy.AnnData |
- Gene expression data for a single-cell RNA-seq experiment. If it is a -dataframe columns are single cells and rows are genes, while if it is -a AnnData object, columns are genes and rows are single cells. |
-
metadata |
- pandas.DataFrame |
- Metadata containing the cell types for each single cells in the -RNA-seq dataset. |
-
index_col |
- str |
- Column-name for the single cells in the metadata. |
-
group_col |
- str |
- Column-name in the metadata for the grouping single cells into cell types -by the selected aggregation method. |
-
ppi_data |
- pandas.DataFrame |
- List of protein-protein interactions (or ligand-receptor pairs) used for -inferring the cell-cell interactions and communication. |
-
complex_sep |
- str |
- Symbol that separates the protein subunits in a multimeric complex. -For example, '&' is the complex_sep for a list of ligand-receptor pairs -where a protein partner could be "CD74&CD44". |
-
complex_agg_method |
- str |
- Method to aggregate the expression value of multiple genes in a -complex. -
|
-
ref_ppi |
- pandas.DataFrame |
- Reference list of protein-protein interactions (or ligand-receptor pairs) used -for inferring the cell-cell interactions and communication. It could be the -same as 'ppi_data' if ppi_data is not bidirectional (that is, contains -ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must -be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction). |
-
interaction_columns |
- tuple |
- Contains the names of the columns where to find the partners in a -dataframe of protein-protein interactions. If the list is for -ligand-receptor pairs, the first column is for the ligands and the second -for the receptors. |
-
analysis_setup |
- dict |
- Contains main setup for running the cell-cell interactions and communication -analyses. -Three main setups are needed (passed as keys): -
|
-
cutoff_setup |
- dict |
- Contains two keys: 'type' and 'parameter'. The first key represent the -way to use a cutoff or threshold, while parameter is the value used -to binarize the expression values. -The key 'type' can be: -
|
-
interaction_space |
- cell2cell.core.interaction_space.InteractionSpace |
- Interaction space that contains all the required elements to perform the -cell-cell interaction and communication analysis between every pair of cells. -After performing the analyses, the results are stored in this object. |
-
aggregation_method |
- str |
- Specifies the method to use to aggregate gene expression of single -cells into their respective cell types. Used to perform the CCI -analysis since it is on the cell types rather than single cells. -Options are: -
|
-
ccc_permutation_pvalues |
- pandas.DataFrame |
- Contains the P-values of the permutation analysis on the -communication scores. |
-
cci_permutation_pvalues |
- pandas.DataFrame |
- Contains the P-values of the permutation analysis on the -CCI scores. |
-
__adata |
- boolean |
- Auxiliary variable used for storing whether rnaseq_data -is an AnnData object. |
-
cci_score : str, default='bray_curtis' + Scoring function to aggregate the communication scores between a pair of + cells. It computes an overall potential of cell-cell interactions. + Options:
+- 'bray_curtis' : Bray-Curtis-like score.
+- 'jaccard' : Jaccard-like score.
+- 'count' : Number of LR pairs that the pair of cells use.
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
cci_type : str, default='undirected' + Specifies whether computing the cci_score in a directed or undirected + way. For a pair of cells A and B, directed means that the ligands are + considered only from cell A and receptors only from cell B or viceversa. + While undirected simultaneously considers signaling from cell A to + cell B and from cell B to cell A.
+expression_threshold : float, default=0.2 + Threshold value to binarize gene expression when using + communication_score='expression_thresholding'. Units have to be the + same as the aggregated gene expression matrix (e.g., counts, fraction + of cells with non-zero counts, etc.).
+aggregation_method : str, default='nn_cell_fraction' + Specifies the method to use to aggregate gene expression of single + cells into their respective cell types. Used to perform the CCI + analysis since it is on the cell types rather than single cells. + Options are:
+- 'nn_cell_fraction' : Among the single cells composing a cell type, it
+ calculates the fraction of single cells with non-zero count values
+ of a given gene.
+- 'average' : Computes the average gene expression among the single cells
+ composing a cell type for a given gene.
+
barcode_col : str, default='barcodes' + Column-name for the single cells in the metadata.
+celltype_col : str, default='celltypes' + Column-name in the metadata for the grouping single cells into cell types + by the selected aggregation method.
+complex_sep : str, default=None + Symbol that separates the protein subunits in a multimeric complex. + For example, '&' is the complex_sep for a list of ligand-receptor pairs + where a protein partner could be "CD74&CD44".
+complex_agg_method : str, default='min' + Method to aggregate the expression value of multiple genes in a + complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
verbose : boolean, default=False + Whether printing or not steps of the analysis.
+rnaseq_data : pandas.DataFrame or scanpy.AnnData + Gene expression data for a single-cell RNA-seq experiment. If it is a + dataframe columns are single cells and rows are genes, while if it is + a AnnData object, columns are genes and rows are single cells.
+metadata : pandas.DataFrame + Metadata containing the cell types for each single cells in the + RNA-seq dataset.
+index_col : str + Column-name for the single cells in the metadata.
+group_col : str + Column-name in the metadata for the grouping single cells into cell types + by the selected aggregation method.
+ppi_data : pandas.DataFrame + List of protein-protein interactions (or ligand-receptor pairs) used for + inferring the cell-cell interactions and communication.
+complex_sep : str + Symbol that separates the protein subunits in a multimeric complex. + For example, '&' is the complex_sep for a list of ligand-receptor pairs + where a protein partner could be "CD74&CD44".
+complex_agg_method : str + Method to aggregate the expression value of multiple genes in a + complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
ref_ppi : pandas.DataFrame + Reference list of protein-protein interactions (or ligand-receptor pairs) used + for inferring the cell-cell interactions and communication. It could be the + same as 'ppi_data' if ppi_data is not bidirectional (that is, contains + ProtA-ProtB interaction as well as ProtB-ProtA interaction). ref_ppi must + be undirected (contains only ProtA-ProtB and not ProtB-ProtA interaction).
+interaction_columns : tuple + Contains the names of the columns where to find the partners in a + dataframe of protein-protein interactions. If the list is for + ligand-receptor pairs, the first column is for the ligands and the second + for the receptors.
+analysis_setup : dict + Contains main setup for running the cell-cell interactions and communication + analyses. + Three main setups are needed (passed as keys):
+- 'communication_score' : is the type of communication score used to detect
+ active ligand-receptor pairs between each pair of cell.
+ It can be:
+
+ - 'expression_thresholding'
+ - 'expression_product'
+ - 'expression_mean'
+ - 'expression_gmean'
+
+- 'cci_score' : is the scoring function to aggregate the communication
+ scores.
+ It can be:
+
+ - 'bray_curtis'
+ - 'jaccard'
+ - 'count'
+ - 'icellnet'
+
+- 'cci_type' : is the type of interaction between two cells. If it is
+ undirected, all ligands and receptors are considered from both cells.
+ If it is directed, ligands from one cell and receptors from the other
+ are considered separately with respect to ligands from the second
+ cell and receptor from the first one.
+ So, it can be:
+
+ - 'undirected'
+ - 'directed'
+
cutoff_setup : dict + Contains two keys: 'type' and 'parameter'. The first key represent the + way to use a cutoff or threshold, while parameter is the value used + to binarize the expression values. + The key 'type' can be:
+- 'local_percentile' : computes the value of a given percentile, for each
+ gene independently. In this case, the parameter corresponds to the
+ percentile to compute, as a float value between 0 and 1.
+- 'global_percentile' : computes the value of a given percentile from all
+ genes and samples simultaneously. In this case, the parameter
+ corresponds to the percentile to compute, as a float value between
+ 0 and 1. All genes have the same cutoff.
+- 'file' : load a cutoff table from a file. Parameter in this case is the
+ path of that file. It must contain the same genes as index and same
+ samples as columns.
+- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in each sample. This allows to use specific cutoffs for
+ each sample. The columns here must be the same as the ones in the
+ rnaseq_data.
+- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in only one column. These cutoffs will be applied to
+ all samples.
+- 'constant_value' : binarizes the expression. Evaluates whether
+ expression is greater than the value input in the parameter.
+
interaction_space : cell2cell.core.interaction_space.InteractionSpace + Interaction space that contains all the required elements to perform the + cell-cell interaction and communication analysis between every pair of cells. + After performing the analyses, the results are stored in this object.
+aggregation_method : str + Specifies the method to use to aggregate gene expression of single + cells into their respective cell types. Used to perform the CCI + analysis since it is on the cell types rather than single cells. + Options are:
+- 'nn_cell_fraction' : Among the single cells composing a cell type, it
+ calculates the fraction of single cells with non-zero count values
+ of a given gene.
+- 'average' : Computes the average gene expression among the single cells
+ composing a cell type for a given gene.
+
ccc_permutation_pvalues : pandas.DataFrame + Contains the P-values of the permutation analysis on the + communication scores.
+cci_permutation_pvalues : pandas.DataFrame + Contains the P-values of the permutation analysis on the + CCI scores.
+__adata : boolean + Auxiliary variable used for storing whether rnaseq_data + is an AnnData object.
+cell2cell/analysis/cell2cell_pipelines.py
class SingleCellInteractions:
@@ -1846,155 +2165,167 @@ self.analysis_setup['communication_score'] = communication_score
self.analysis_setup['cci_score'] = cci_score
self.analysis_setup['cci_type'] = cci_type
-
- # Initialize PPI
- ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
- proteins=genes,
- complex_sep=complex_sep,
- upper_letter_comparison=False,
- interaction_columns=interaction_columns)
-
- self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
- interaction_columns=interaction_columns,
- verbose=verbose)
-
- if self.analysis_setup['cci_type'] == 'undirected':
- self.ref_ppi = self.ppi_data
- self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
- interaction_columns=interaction_columns,
- verbose=verbose)
- else:
- self.ref_ppi = None
-
- # Thresholding
- self.cutoff_setup['type'] = 'constant_value'
- self.cutoff_setup['parameter'] = expression_threshold
-
+ self.analysis_setup['ccc_type'] = cci_type
+
+ # Initialize PPI
+ ppi_data_ = ppi.filter_ppi_by_proteins(ppi_data=ppi_data,
+ proteins=genes,
+ complex_sep=complex_sep,
+ upper_letter_comparison=False,
+ interaction_columns=interaction_columns)
+
+ self.ppi_data = ppi.remove_ppi_bidirectionality(ppi_data=ppi_data_,
+ interaction_columns=interaction_columns,
+ verbose=verbose)
+
+ if self.analysis_setup['cci_type'] == 'undirected':
+ self.ref_ppi = self.ppi_data
+ self.ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
+ interaction_columns=interaction_columns,
+ verbose=verbose)
+ else:
+ self.ref_ppi = None
+
+ # Thresholding
+ self.cutoff_setup['type'] = 'constant_value'
+ self.cutoff_setup['parameter'] = expression_threshold
- # Aggregate single-cell RNA-Seq data
- self.aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
- metadata=self.metadata,
- barcode_col=self.index_col,
- celltype_col=self.group_col,
- method=self.aggregation_method,
- transposed=self.__adata)
-
- # Interaction Space
- self.interaction_space = initialize_interaction_space(rnaseq_data=self.aggregated_expression,
- ppi_data=self.ppi_data,
- cutoff_setup=self.cutoff_setup,
- analysis_setup=self.analysis_setup,
- complex_sep=self.complex_sep,
- complex_agg_method=self.complex_agg_method,
- interaction_columns=self.interaction_columns,
- verbose=verbose)
-
- def permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None,
- verbose=False):
- '''Performs permutation analysis of cell-type labels. Detects
- significant CCI or communication scores.
-
- Parameters
- ----------
- permutations : int, default=100
- Number of permutations where in each of them a random
- shuffle of cell-type labels is performed, followed of
- computing CCI or communication scores to create a null
- distribution.
-
- evaluation : str, default='communication'
- Whether calculating P-values for CCI or communication scores.
-
- - 'interactions' : For CCI scores.
- - 'communication' : For communication scores.
-
- fdr_correction : boolean, default=True
- Whether performing a multiple test correction after
- computing P-values. In this case corresponds to an
- FDR or Benjamini-Hochberg correction, using an alpha
- equal to 0.05.
-
- random_state : int, default=None
- Seed for randomization.
-
- verbose : boolean, default=False
- Whether printing or not steps of the analysis.
- '''
- if evaluation == 'communication':
- if 'communication_matrix' not in self.interaction_space.interaction_elements.keys():
- raise ValueError('Run the method compute_pairwise_communication_scores() before permutation analysis.')
- score = self.interaction_space.interaction_elements['communication_matrix']
- elif evaluation == 'interactions':
- if not hasattr(self.interaction_space, 'distance_matrix'):
- raise ValueError('Run the method compute_pairwise_interactions() before permutation analysis.')
- score = self.interaction_space.interaction_elements['cci_matrix']
- else:
- raise ValueError('Not a valid evaluation')
-
- randomized_scores = []
-
- for i in tqdm(range(permutations), disable=not verbose):
- if random_state is not None:
- seed = random_state + i
- else:
- seed = random_state
-
- randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
- columns=self.group_col,
- random_state=seed)
-
- aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
- metadata=randomized_meta,
- barcode_col=self.index_col,
- celltype_col=self.group_col,
- method=self.aggregation_method,
- transposed=self.__adata)
-
- interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
- ppi_data=self.ppi_data,
- cutoff_setup=self.cutoff_setup,
- analysis_setup=self.analysis_setup,
- complex_sep=self.complex_sep,
- complex_agg_method=self.complex_agg_method,
- interaction_columns=self.interaction_columns,
- verbose=False)
-
- if evaluation == 'communication':
- interaction_space.compute_pairwise_communication_scores(verbose=False)
- randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
- elif evaluation == 'interactions':
- interaction_space.compute_pairwise_cci_scores(verbose=False)
- randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
-
- randomized_scores = np.array(randomized_scores)
- base_scores = score.values.flatten()
- pvals = np.ones(base_scores.shape)
- n_pvals = len(base_scores)
- randomized_scores = randomized_scores.reshape((-1, n_pvals))
- for i in range(n_pvals):
- dist = randomized_scores[:, i]
- dist = np.append(dist, base_scores[i])
- pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
- dist=dist,
- consider_size=True,
- comparison='different'
- )
- pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
-
- if fdr_correction:
- symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
- if symmetric:
- pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
- alpha=0.05)
- else:
- pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
- alpha=0.05)
-
- if evaluation == 'communication':
- self.ccc_permutation_pvalues = pval_df
- elif evaluation == 'interactions':
- self.cci_permutation_pvalues = pval_df
- return pval_df
+
+ # Aggregate single-cell RNA-Seq data
+ self.aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
+ metadata=self.metadata,
+ barcode_col=self.index_col,
+ celltype_col=self.group_col,
+ method=self.aggregation_method,
+ transposed=self.__adata)
+
+ # Interaction Space
+ self.interaction_space = initialize_interaction_space(rnaseq_data=self.aggregated_expression,
+ ppi_data=self.ppi_data,
+ cutoff_setup=self.cutoff_setup,
+ analysis_setup=self.analysis_setup,
+ complex_sep=self.complex_sep,
+ complex_agg_method=self.complex_agg_method,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+
+ def permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None,
+ verbose=False):
+ '''Performs permutation analysis of cell-type labels. Detects
+ significant CCI or communication scores.
+
+ Parameters
+ ----------
+ permutations : int, default=100
+ Number of permutations where in each of them a random
+ shuffle of cell-type labels is performed, followed of
+ computing CCI or communication scores to create a null
+ distribution.
+
+ evaluation : str, default='communication'
+ Whether calculating P-values for CCI or communication scores.
+
+ - 'interactions' : For CCI scores.
+ - 'communication' : For communication scores.
+
+ fdr_correction : boolean, default=True
+ Whether performing a multiple test correction after
+ computing P-values. In this case corresponds to an
+ FDR or Benjamini-Hochberg correction, using an alpha
+ equal to 0.05.
+
+ random_state : int, default=None
+ Seed for randomization.
+
+ verbose : boolean, default=False
+ Whether printing or not steps of the analysis.
+ '''
+ if evaluation == 'communication':
+ if 'communication_matrix' not in self.interaction_space.interaction_elements.keys():
+ raise ValueError('Run the method compute_pairwise_communication_scores() before permutation analysis.')
+ score = self.interaction_space.interaction_elements['communication_matrix'].copy()
+ elif evaluation == 'interactions':
+ if not hasattr(self.interaction_space, 'distance_matrix'):
+ raise ValueError('Run the method compute_pairwise_interactions() before permutation analysis.')
+ score = self.interaction_space.interaction_elements['cci_matrix'].copy()
+ else:
+ raise ValueError('Not a valid evaluation')
+
+ randomized_scores = []
+
+ analysis_setup = self.analysis_setup.copy()
+ ppi_data = self.ppi_data
+ if (evaluation == 'communication') & (self.analysis_setup['cci_type'] != self.analysis_setup['ccc_type']):
+ analysis_setup['cci_type'] = analysis_setup['ccc_type']
+ if self.analysis_setup['cci_type'] == 'directed':
+ ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+ elif self.analysis_setup['cci_type'] == 'undirected':
+ ppi_data = self.ref_ppi
+
+ for i in tqdm(range(permutations), disable=not verbose):
+ if random_state is not None:
+ seed = random_state + i
+ else:
+ seed = random_state
+
+ randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
+ columns=self.group_col,
+ random_state=seed)
+
+ aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
+ metadata=randomized_meta,
+ barcode_col=self.index_col,
+ celltype_col=self.group_col,
+ method=self.aggregation_method,
+ transposed=self.__adata)
+
+ interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
+ ppi_data=ppi_data,
+ cutoff_setup=self.cutoff_setup,
+ analysis_setup=analysis_setup,
+ complex_sep=self.complex_sep,
+ complex_agg_method=self.complex_agg_method,
+ interaction_columns=self.interaction_columns,
+ verbose=False)
+
+ if evaluation == 'communication':
+ interaction_space.compute_pairwise_communication_scores(verbose=False)
+ randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
+ elif evaluation == 'interactions':
+ interaction_space.compute_pairwise_cci_scores(verbose=False)
+ randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
+
+ randomized_scores = np.array(randomized_scores)
+ base_scores = score.values.flatten()
+ pvals = np.ones(base_scores.shape)
+ n_pvals = len(base_scores)
+ randomized_scores = randomized_scores.reshape((-1, n_pvals))
+ for i in range(n_pvals):
+ dist = randomized_scores[:, i]
+ dist = np.append(dist, base_scores[i])
+ pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
+ dist=dist,
+ consider_size=True,
+ comparison='different'
+ )
+ pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
+
+ if fdr_correction:
+ symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
+ if symmetric:
+ pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
+ alpha=0.05)
+ else:
+ pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
+ alpha=0.05)
+
+ if evaluation == 'communication':
+ self.ccc_permutation_pvalues = pval_df
+ elif evaluation == 'interactions':
+ self.cci_permutation_pvalues = pval_df
+ return pval_df
interaction_elements
@@ -2019,7 +2350,7 @@ readonly
compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)
-Computes overall CCI scores for each pair of cells.
+cci_score : str, default=None + Scoring function to aggregate the communication scores between + a pair of cells. It computes an overall potential of cell-cell + interactions. If None, it will use the one stored in the + attribute analysis_setup of this object. + Options:
+- 'bray_curtis' : Bray-Curtis-like score.
+- 'jaccard' : Jaccard-like score.
+- 'count' : Number of LR pairs that the pair of cells use.
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
use_ppi_score : boolean, default=False + Whether using a weight of LR pairs specified in the ppi_data + to compute the scores.
+verbose : boolean, default=True + Whether printing or not steps of the analysis.
-Parameters: | -
-
|
-
---|
cell2cell/analysis/cell2cell_pipelines.py
def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
@@ -2117,79 +2438,72 @@
+
compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=None, cells=None, cci_type=None, verbose=True)
-
+
- Computes the communication scores for each LR pairs in
-a given pair of sender-receiver cell
-
-
-
-
-
-
-
-
- Parameters:
-
-
- communication_score (str, default=None
) – Type of communication score to infer the potential use of
-a given ligand-receptor pair by a pair of cells/tissues/samples.
-If None, the score stored in the attribute analysis_setup
-will be used.
-Available communication_scores are:
-
-- 'expresion_thresholding' : Computes the joint presence of a
+
Computes the communication scores for each LR pairs in
+a given pair of sender-receiver cell
+Parameters
+communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+- 'expresion_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
- binarizing their gene expression levels.
-- 'expression_mean' : Computes the average between the expression
+ binarizing their gene expression levels.
+- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_product' : Computes the product between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_gmean' : Computes the geometric mean between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- ref_ppi_data (pandas.DataFrame, default=None
) – Reference list of protein-protein interactions (or
-ligand-receptor pairs) used for inferring the cell-cell
-interactions and communication. It could be the same as
-'ppi_data' if ppi_data is not bidirectional (that is,
-contains ProtA-ProtB interaction as well as ProtB-ProtA
-interaction). ref_ppi must be undirected (contains only
-ProtA-ProtB and not ProtB-ProtA interaction). If None
-the one stored in the attribute ref_ppi will be used.
- interaction_columns (tuple, default=None
) – Contains the names of the columns where to find the
-partners in a dataframe of protein-protein interactions.
-If the list is for ligand-receptor pairs, the first column
-is for the ligands and the second for the receptors. If
-None, the one stored in the attribute interaction_columns
-will be used.
- cells (list=None
) – List of cells to consider.
- cci_type (str, default=None
) – Type of interaction between two cells. Used to specify
-if we want to consider a LR pair in both directions.
-It can be:
-
-- 'undirected'
-- 'directed'
-
-If None, the one stored in the attribute analysis_setup
-will be used.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
+ expression of a receptor on a receiver cell.
+
+
+
use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+ref_ppi_data : pandas.DataFrame, default=None
+ Reference list of protein-protein interactions (or
+ ligand-receptor pairs) used for inferring the cell-cell
+ interactions and communication. It could be the same as
+ 'ppi_data' if ppi_data is not bidirectional (that is,
+ contains ProtA-ProtB interaction as well as ProtB-ProtA
+ interaction). ref_ppi must be undirected (contains only
+ ProtA-ProtB and not ProtB-ProtA interaction). If None
+ the one stored in the attribute ref_ppi will be used.
+interaction_columns : tuple, default=None
+ Contains the names of the columns where to find the
+ partners in a dataframe of protein-protein interactions.
+ If the list is for ligand-receptor pairs, the first column
+ is for the ligands and the second for the receptors. If
+ None, the one stored in the attribute interaction_columns
+ will be used.
+cells : list=None
+ List of cells to consider.
+cci_type : str, default=None
+ Type of interaction between two cells. Used to specify
+ if we want to consider a LR pair in both directions.
+ It can be:
+- 'undirected'
+- 'directed'
+
+If None, 'directed' will be used.
+
+
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+
Source code in cell2cell/analysis/cell2cell_pipelines.py
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
@@ -2253,28 +2567,29 @@ - 'undirected'
- 'directed'
- If None, the one stored in the attribute analysis_setup
- will be used.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
- '''
- if interaction_columns is None:
- interaction_columns = self.interaction_columns # Used only for ref_ppi_data
-
- if ref_ppi_data is None:
- ref_ppi_data = self.ref_ppi
-
- if cci_type is None:
- cci_type = 'directed'
-
- self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
- use_ppi_score=use_ppi_score,
- ref_ppi_data=ref_ppi_data,
- interaction_columns=interaction_columns,
- cells=cells,
- cci_type=cci_type,
- verbose=verbose)
+ If None, 'directed' will be used.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+ '''
+ if interaction_columns is None:
+ interaction_columns = self.interaction_columns # Used only for ref_ppi_data
+
+ if ref_ppi_data is None:
+ ref_ppi_data = self.ref_ppi
+
+ if cci_type is None:
+ cci_type = 'directed'
+
+ self.analysis_setup['ccc_type'] = cci_type
+
+ self.interaction_space.compute_pairwise_communication_scores(communication_score=communication_score,
+ use_ppi_score=use_ppi_score,
+ ref_ppi_data=ref_ppi_data,
+ interaction_columns=interaction_columns,
+ cells=cells,
+ cci_type=cci_type,
+ verbose=verbose)
permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None, verbose=False)
-
+Performs permutation analysis of cell-type labels. Detects
-significant CCI or communication scores.
+Performs permutation analysis of cell-type labels. Detects +significant CCI or communication scores.
+permutations : int, default=100 + Number of permutations where in each of them a random + shuffle of cell-type labels is performed, followed of + computing CCI or communication scores to create a null + distribution.
+evaluation : str, default='communication' + Whether calculating P-values for CCI or communication scores.
+- 'interactions' : For CCI scores.
+- 'communication' : For communication scores.
+
fdr_correction : boolean, default=True + Whether performing a multiple test correction after + computing P-values. In this case corresponds to an + FDR or Benjamini-Hochberg correction, using an alpha + equal to 0.05.
+random_state : int, default=None + Seed for randomization.
+verbose : boolean, default=False + Whether printing or not steps of the analysis.
-Parameters: | -
-
|
-
---|
cell2cell/analysis/cell2cell_pipelines.py
def permute_cell_labels(self, permutations=100, evaluation='communication', fdr_correction=True, random_state=None,
@@ -2364,78 +2670,89 @@ if evaluation == 'communication':
if 'communication_matrix' not in self.interaction_space.interaction_elements.keys():
raise ValueError('Run the method compute_pairwise_communication_scores() before permutation analysis.')
- score = self.interaction_space.interaction_elements['communication_matrix']
+ score = self.interaction_space.interaction_elements['communication_matrix'].copy()
elif evaluation == 'interactions':
if not hasattr(self.interaction_space, 'distance_matrix'):
raise ValueError('Run the method compute_pairwise_interactions() before permutation analysis.')
- score = self.interaction_space.interaction_elements['cci_matrix']
+ score = self.interaction_space.interaction_elements['cci_matrix'].copy()
else:
raise ValueError('Not a valid evaluation')
randomized_scores = []
- for i in tqdm(range(permutations), disable=not verbose):
- if random_state is not None:
- seed = random_state + i
- else:
- seed = random_state
-
- randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
- columns=self.group_col,
- random_state=seed)
-
- aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
- metadata=randomized_meta,
- barcode_col=self.index_col,
- celltype_col=self.group_col,
- method=self.aggregation_method,
- transposed=self.__adata)
+ analysis_setup = self.analysis_setup.copy()
+ ppi_data = self.ppi_data
+ if (evaluation == 'communication') & (self.analysis_setup['cci_type'] != self.analysis_setup['ccc_type']):
+ analysis_setup['cci_type'] = analysis_setup['ccc_type']
+ if self.analysis_setup['cci_type'] == 'directed':
+ ppi_data = ppi.bidirectional_ppi_for_cci(ppi_data=self.ppi_data,
+ interaction_columns=self.interaction_columns,
+ verbose=verbose)
+ elif self.analysis_setup['cci_type'] == 'undirected':
+ ppi_data = self.ref_ppi
+
+ for i in tqdm(range(permutations), disable=not verbose):
+ if random_state is not None:
+ seed = random_state + i
+ else:
+ seed = random_state
- interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
- ppi_data=self.ppi_data,
- cutoff_setup=self.cutoff_setup,
- analysis_setup=self.analysis_setup,
- complex_sep=self.complex_sep,
- complex_agg_method=self.complex_agg_method,
- interaction_columns=self.interaction_columns,
- verbose=False)
-
- if evaluation == 'communication':
- interaction_space.compute_pairwise_communication_scores(verbose=False)
- randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
- elif evaluation == 'interactions':
- interaction_space.compute_pairwise_cci_scores(verbose=False)
- randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
-
- randomized_scores = np.array(randomized_scores)
- base_scores = score.values.flatten()
- pvals = np.ones(base_scores.shape)
- n_pvals = len(base_scores)
- randomized_scores = randomized_scores.reshape((-1, n_pvals))
- for i in range(n_pvals):
- dist = randomized_scores[:, i]
- dist = np.append(dist, base_scores[i])
- pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
- dist=dist,
- consider_size=True,
- comparison='different'
- )
- pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
-
- if fdr_correction:
- symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
- if symmetric:
- pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
- alpha=0.05)
- else:
- pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
- alpha=0.05)
-
- if evaluation == 'communication':
- self.ccc_permutation_pvalues = pval_df
- elif evaluation == 'interactions':
- self.cci_permutation_pvalues = pval_df
- return pval_df
+ randomized_meta = manipulate_dataframes.shuffle_cols_in_df(df=self.metadata.reset_index(),
+ columns=self.group_col,
+ random_state=seed)
+
+ aggregated_expression = rnaseq.aggregate_single_cells(rnaseq_data=self.rnaseq_data,
+ metadata=randomized_meta,
+ barcode_col=self.index_col,
+ celltype_col=self.group_col,
+ method=self.aggregation_method,
+ transposed=self.__adata)
+
+ interaction_space = initialize_interaction_space(rnaseq_data=aggregated_expression,
+ ppi_data=ppi_data,
+ cutoff_setup=self.cutoff_setup,
+ analysis_setup=analysis_setup,
+ complex_sep=self.complex_sep,
+ complex_agg_method=self.complex_agg_method,
+ interaction_columns=self.interaction_columns,
+ verbose=False)
+
+ if evaluation == 'communication':
+ interaction_space.compute_pairwise_communication_scores(verbose=False)
+ randomized_scores.append(interaction_space.interaction_elements['communication_matrix'].values.flatten())
+ elif evaluation == 'interactions':
+ interaction_space.compute_pairwise_cci_scores(verbose=False)
+ randomized_scores.append(interaction_space.interaction_elements['cci_matrix'].values.flatten())
+
+ randomized_scores = np.array(randomized_scores)
+ base_scores = score.values.flatten()
+ pvals = np.ones(base_scores.shape)
+ n_pvals = len(base_scores)
+ randomized_scores = randomized_scores.reshape((-1, n_pvals))
+ for i in range(n_pvals):
+ dist = randomized_scores[:, i]
+ dist = np.append(dist, base_scores[i])
+ pvals[i] = permutation.compute_pvalue_from_dist(obs_value=base_scores[i],
+ dist=dist,
+ consider_size=True,
+ comparison='different'
+ )
+ pval_df = pd.DataFrame(pvals.reshape(score.shape), index=score.index, columns=score.columns)
+
+ if fdr_correction:
+ symmetric = manipulate_dataframes.check_symmetry(df=pval_df)
+ if symmetric:
+ pval_df = multitest.compute_fdrcorrection_symmetric_matrix(X=pval_df,
+ alpha=0.05)
+ else:
+ pval_df = multitest.compute_fdrcorrection_asymmetric_matrix(X=pval_df,
+ alpha=0.05)
+
+ if evaluation == 'communication':
+ self.ccc_permutation_pvalues = pval_df
+ elif evaluation == 'interactions':
+ self.cci_permutation_pvalues = pval_df
+ return pval_df
-SpatialSingleCellInteractions
-
-
-
-initialize_interaction_space(rnaseq_data, ppi_data, cutoff_setup, analysis_setup, excluded_cells=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)
-Initializes a InteractionSpace object to perform the analyses
+rnaseq_data : pandas.DataFrame + Gene expression data for a bulk RNA-seq experiment or a single-cell + experiment after aggregation into cell types. Columns are samples + and rows are genes.
+ppi_data : pandas.DataFrame + List of protein-protein interactions (or ligand-receptor pairs) used + for inferring the cell-cell interactions and communication.
+cutoff_setup : dict + Contains two keys: 'type' and 'parameter'. The first key represent the + way to use a cutoff or threshold, while parameter is the value used + to binarize the expression values. + The key 'type' can be:
+- 'local_percentile' : computes the value of a given percentile, for each
+ gene independently. In this case, the parameter corresponds to the
+ percentile to compute, as a float value between 0 and 1.
+- 'global_percentile' : computes the value of a given percentile from all
+ genes and samples simultaneously. In this case, the parameter
+ corresponds to the percentile to compute, as a float value between
+ 0 and 1. All genes have the same cutoff.
+- 'file' : load a cutoff table from a file. Parameter in this case is the
+ path of that file. It must contain the same genes as index and same
+ samples as columns.
+- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in each sample. This allows to use specific cutoffs for
+ each sample. The columns here must be the same as the ones in the
+ rnaseq_data.
+- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in only one column. These cutoffs will be applied to
+ all samples.
+- 'constant_value' : binarizes the expression. Evaluates whether
+ expression is greater than the value input in the parameter.
+
analysis_setup : dict + Contains main setup for running the cell-cell interactions and communication + analyses. + Three main setups are needed (passed as keys):
+- 'communication_score' : is the type of communication score used to detect
+ active ligand-receptor pairs between each pair of cell.
+ It can be:
+
+ - 'expression_thresholding'
+ - 'expression_product'
+ - 'expression_mean'
+ - 'expression_gmean'
+
+- 'cci_score' : is the scoring function to aggregate the communication
+ scores.
+ It can be:
+
+ - 'bray_curtis'
+ - 'jaccard'
+ - 'count'
+ - 'icellnet'
+
+- 'cci_type' : is the type of interaction between two cells. If it is
+ undirected, all ligands and receptors are considered from both cells.
+ If it is directed, ligands from one cell and receptors from the other
+ are considered separately with respect to ligands from the second
+ cell and receptor from the first one.
+ So, it can be:
+
+ - 'undirected'
+ - 'directed'
+
excluded_cells : list, default=None + List of cells in the rnaseq_data to be excluded. If None, all cells + are considered.
+complex_sep : str, default=None + Symbol that separates the protein subunits in a multimeric complex. + For example, '&' is the complex_sep for a list of ligand-receptor pairs + where a protein partner could be "CD74&CD44".
+complex_agg_method : str, default='min' + Method to aggregate the expression value of multiple genes in a + complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+
interaction_columns : tuple, default=('A', 'B') + Contains the names of the columns where to find the partners in a + dataframe of protein-protein interactions. If the list is for + ligand-receptor pairs, the first column is for the ligands and the second + for the receptors.
+verbose : boolean, default=True + Whether printing or not steps of the analysis.
+interaction_space : cell2cell.core.interaction_space.InteractionSpace + Interaction space that contains all the required elements to perform the + cell-cell interaction and communication analysis between every pair of cells. + After performing the analyses, the results are stored in this object.
-Parameters: | -
-
|
-
---|
Returns: | -
-
|
-
---|
cell2cell/analysis/cell2cell_pipelines.py
def initialize_interaction_space(rnaseq_data, ppi_data, cutoff_setup, analysis_setup, excluded_cells=None,
@@ -2766,12 +3028,12 @@
+
tensor_downstream
-
+
@@ -2785,188 +3047,47 @@
-Functions
-
-
-
-
-
-
-get_joint_loadings(result, dim1, dim2, factor)
-
-
-
-
-
-
- Creates the joint loading distribution between two tensor dimensions for a
-given factor output from decomposition.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- result (any Tensor class in cell2cell.tensor.tensor or a dict
) – Either a Tensor type or a dictionary which resulted from the tensor
-decomposition. If it is a dict, it should be the one in, for example,
-InteractionTensor.factors
- dim1 (str
) – One of the tensor dimensions (options are in the keys of the dict,
-or interaction.factors.keys())
- dim2 (str
) – A second tensor dimension (options are in the keys of the dict,
-or interaction.factors.keys())
- factor (str
) – One of the factors output from the decomposition (e.g. 'Factor 1').
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Joint distribution of factor loadings for the specified dimensions.
-Rows correspond to elements in dim1 and columns to elements in dim2.
-
-
-
-
-
-
- Source code in cell2cell/analysis/tensor_downstream.py
- def get_joint_loadings(result, dim1, dim2, factor):
- """
- Creates the joint loading distribution between two tensor dimensions for a
- given factor output from decomposition.
-
- Parameters
- ----------
- result : any Tensor class in cell2cell.tensor.tensor or a dict
- Either a Tensor type or a dictionary which resulted from the tensor
- decomposition. If it is a dict, it should be the one in, for example,
- InteractionTensor.factors
-
- dim1 : str
- One of the tensor dimensions (options are in the keys of the dict,
- or interaction.factors.keys())
-
- dim2 : str
- A second tensor dimension (options are in the keys of the dict,
- or interaction.factors.keys())
-
- factor: str
- One of the factors output from the decomposition (e.g. 'Factor 1').
-
- Returns
- -------
- joint_dist : pandas.DataFrame
- Joint distribution of factor loadings for the specified dimensions.
- Rows correspond to elements in dim1 and columns to elements in dim2.
- """
- if hasattr(result, 'factors'):
- result = result.factors
- if result is None:
- raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
- elif isinstance(result, dict):
- pass
- else:
- raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
-
- assert dim1 in result.keys(), 'The specified dimension ' + dim1 + ' is not present in the `result` input'
- assert dim2 in result.keys(), 'The specified dimension ' + dim2 + ' is not present in the `result` input'
-
- vec1 = result[dim1][factor]
- vec2 = result[dim2][factor]
-
- # Calculate the outer product
- joint_dist = pd.DataFrame(data=np.outer(vec1, vec2),
- index=vec1.index,
- columns=vec2.index)
-
- joint_dist.index.name = dim1
- joint_dist.columns.name = dim2
- return joint_dist
-
-
-
-
-
-
-
-get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
+
+compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
-
+
- Generates adjacency matrices for each of the factors
-obtained from a tensor decomposition. These matrices represent a
-cell-cell communication directed network.
+ Computes Gini coefficient on the distribution of edge weights
+in each factor-specific cell-cell communication network. Factors
+obtained from the tensor decomposition with Tensor-cell2cell.
+Parameters
+result : any Tensor class in cell2cell.tensor.tensor or a dict
+ Either a Tensor type or a dictionary which resulted from the tensor
+ decomposition. If it is a dict, it should be the one in, for example,
+ InteractionTensor.factors
+sender_label : str
+ Label for the dimension of sender cells. Usually found in
+ InteractionTensor.order_labels
+receiver_label : str
+ Label for the dimension of receiver cells. Usually found in
+ InteractionTensor.order_labels
+Returns
+gini_df : pandas.DataFrame
+ Dataframe containing the Gini coefficient of each factor from
+ a tensor decomposition. Calculated on the factor-specific
+ cell-cell communication networks.
-
-
-
-
-
-
-
- Parameters:
-
-
- result (any Tensor class in cell2cell.tensor.tensor or a dict
) – Either a Tensor type or a dictionary which resulted from the tensor
-decomposition. If it is a dict, it should be the one in, for example,
-InteractionTensor.factors
- sender_label (str
) – Label for the dimension of sender cells. Usually found in
-InteractionTensor.order_labels
- receiver_label (str
) – Label for the dimension of receiver cells. Usually found in
-InteractionTensor.order_labels
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– A dictionary containing a pandas.DataFrame for each of the factors
-(factor names are the keys of the dict). These dataframes are the
-adjacency matrices of the CCC networks.
-
-
-
-
-
Source code in cell2cell/analysis/tensor_downstream.py
- def get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
+ def compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
'''
- Generates adjacency matrices for each of the factors
- obtained from a tensor decomposition. These matrices represent a
- cell-cell communication directed network.
+ Computes Gini coefficient on the distribution of edge weights
+ in each factor-specific cell-cell communication network. Factors
+ obtained from the tensor decomposition with Tensor-cell2cell.
Parameters
----------
@@ -2985,10 +3106,10 @@
Returns
-------
- networks : dict
- A dictionary containing a pandas.DataFrame for each of the factors
- (factor names are the keys of the dict). These dataframes are the
- adjacency matrices of the CCC networks.
+ gini_df : pandas.DataFrame
+ Dataframe containing the Gini coefficient of each factor from
+ a tensor decomposition. Calculated on the factor-specific
+ cell-cell communication networks.
'''
if hasattr(result, 'factors'):
result = result.factors
@@ -3001,14 +3122,17 @@
factors = sorted(list(set(result[sender_label].columns) & set(result[receiver_label].columns)))
- networks = dict()
+ ginis = []
for f in factors:
- networks[f] = get_joint_loadings(result=result,
- dim1=sender_label,
- dim2=receiver_label,
- factor=f
- )
- return networks
+ factor_net = get_joint_loadings(result=result,
+ dim1=sender_label,
+ dim2=receiver_label,
+ factor=f
+ )
+ gini = gini_coefficient(factor_net.values.flatten())
+ ginis.append((f, gini))
+ gini_df = pd.DataFrame.from_records(ginis, columns=['Factor', 'Gini'])
+ return gini_df
@@ -3021,57 +3145,32 @@
+
flatten_factor_ccc_networks(networks, orderby='senders')
-
+
- Flattens all adjacency matrices in the factor-specific
-cell-cell communication networks. It generates a matrix
+
Flattens all adjacency matrices in the factor-specific
+cell-cell communication networks. It generates a matrix
where rows are factors and columns are cell-cell pairs.
+Parameters
+networks : dict
+ A dictionary containing a pandas.DataFrame for each of the factors
+ (factor names are the keys of the dict). These dataframes are the
+ adjacency matrices of the CCC networks.
+orderby : str
+ Order of the flatten cell-cell pairs. Options are 'senders' and
+ 'receivers'. 'senders' means to flatten the matrices in a way that
+ all cell-cell pairs with a same sender cell are put next to each others.
+ 'receivers' means the same, but by considering the receiver cell instead.
+Returns
+flatten_networks : pandas.DataFrame
+ A dataframe wherein rows contains a factor-specific network. Columns are
+ the directed cell-cell pairs.
-
-
-
-
-
-
-
- Parameters:
-
-
- networks (dict
) – A dictionary containing a pandas.DataFrame for each of the factors
-(factor names are the keys of the dict). These dataframes are the
-adjacency matrices of the CCC networks.
- orderby (str
) – Order of the flatten cell-cell pairs. Options are 'senders' and
-'receivers'. 'senders' means to flatten the matrices in a way that
-all cell-cell pairs with a same sender cell are put next to each others.
-'receivers' means the same, but by considering the receiver cell instead.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– A dataframe wherein rows contains a factor-specific network. Columns are
-the directed cell-cell pairs.
-
-
-
-
-
Source code in cell2cell/analysis/tensor_downstream.py
def flatten_factor_ccc_networks(networks, orderby='senders'):
@@ -3129,65 +3228,41 @@
-compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
+
+get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells')
-
+
- Computes Gini coefficient on the distribution of edge weights
-in each factor-specific cell-cell communication network. Factors
-obtained from the tensor decomposition with Tensor-cell2cell.
+ Generates adjacency matrices for each of the factors
+obtained from a tensor decomposition. These matrices represent a
+cell-cell communication directed network.
+Parameters
+result : any Tensor class in cell2cell.tensor.tensor or a dict
+ Either a Tensor type or a dictionary which resulted from the tensor
+ decomposition. If it is a dict, it should be the one in, for example,
+ InteractionTensor.factors
+sender_label : str
+ Label for the dimension of sender cells. Usually found in
+ InteractionTensor.order_labels
+receiver_label : str
+ Label for the dimension of receiver cells. Usually found in
+ InteractionTensor.order_labels
+Returns
+networks : dict
+ A dictionary containing a pandas.DataFrame for each of the factors
+ (factor names are the keys of the dict). These dataframes are the
+ adjacency matrices of the CCC networks.
-
-
-
-
-
-
-
- Parameters:
-
-
- result (any Tensor class in cell2cell.tensor.tensor or a dict
) – Either a Tensor type or a dictionary which resulted from the tensor
-decomposition. If it is a dict, it should be the one in, for example,
-InteractionTensor.factors
- sender_label (str
) – Label for the dimension of sender cells. Usually found in
-InteractionTensor.order_labels
- receiver_label (str
) – Label for the dimension of receiver cells. Usually found in
-InteractionTensor.order_labels
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing the Gini coefficient of each factor from
-a tensor decomposition. Calculated on the factor-specific
-cell-cell communication networks.
-
-
-
-
-
Source code in cell2cell/analysis/tensor_downstream.py
- def compute_gini_coefficients(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
+ def get_factor_specific_ccc_networks(result, sender_label='Sender Cells', receiver_label='Receiver Cells'):
'''
- Computes Gini coefficient on the distribution of edge weights
- in each factor-specific cell-cell communication network. Factors
- obtained from the tensor decomposition with Tensor-cell2cell.
+ Generates adjacency matrices for each of the factors
+ obtained from a tensor decomposition. These matrices represent a
+ cell-cell communication directed network.
Parameters
----------
@@ -3206,10 +3281,10 @@
Returns
-------
- gini_df : pandas.DataFrame
- Dataframe containing the Gini coefficient of each factor from
- a tensor decomposition. Calculated on the factor-specific
- cell-cell communication networks.
+ networks : dict
+ A dictionary containing a pandas.DataFrame for each of the factors
+ (factor names are the keys of the dict). These dataframes are the
+ adjacency matrices of the CCC networks.
'''
if hasattr(result, 'factors'):
result = result.factors
@@ -3222,19 +3297,112 @@
factors = sorted(list(set(result[sender_label].columns) & set(result[receiver_label].columns)))
- ginis = []
+ networks = dict()
for f in factors:
- factor_net = get_joint_loadings(result=result,
- dim1=sender_label,
- dim2=receiver_label,
- factor=f
- )
- gini = gini_coefficient(factor_net.values.flatten())
- ginis.append((f, gini))
- gini_df = pd.DataFrame.from_records(ginis, columns=['Factor', 'Gini'])
- return gini_df
-
-
+ networks[f] = get_joint_loadings(result=result,
+ dim1=sender_label,
+ dim2=receiver_label,
+ factor=f
+ )
+ return networks
+
+
+
+
+
+
+
+
+
+
+
+
+
+get_joint_loadings(result, dim1, dim2, factor)
+
+
+
+
+
+
+ Creates the joint loading distribution between two tensor dimensions for a
+given factor output from decomposition.
+Parameters
+result : any Tensor class in cell2cell.tensor.tensor or a dict
+ Either a Tensor type or a dictionary which resulted from the tensor
+ decomposition. If it is a dict, it should be the one in, for example,
+ InteractionTensor.factors
+dim1 : str
+ One of the tensor dimensions (options are in the keys of the dict,
+ or interaction.factors.keys())
+dim2 : str
+ A second tensor dimension (options are in the keys of the dict,
+ or interaction.factors.keys())
+
+str
+One of the factors output from the decomposition (e.g. 'Factor 1').
+
+Returns
+joint_dist : pandas.DataFrame
+ Joint distribution of factor loadings for the specified dimensions.
+ Rows correspond to elements in dim1 and columns to elements in dim2.
+
+
+ Source code in cell2cell/analysis/tensor_downstream.py
+ def get_joint_loadings(result, dim1, dim2, factor):
+ """
+ Creates the joint loading distribution between two tensor dimensions for a
+ given factor output from decomposition.
+
+ Parameters
+ ----------
+ result : any Tensor class in cell2cell.tensor.tensor or a dict
+ Either a Tensor type or a dictionary which resulted from the tensor
+ decomposition. If it is a dict, it should be the one in, for example,
+ InteractionTensor.factors
+
+ dim1 : str
+ One of the tensor dimensions (options are in the keys of the dict,
+ or interaction.factors.keys())
+
+ dim2 : str
+ A second tensor dimension (options are in the keys of the dict,
+ or interaction.factors.keys())
+
+ factor: str
+ One of the factors output from the decomposition (e.g. 'Factor 1').
+
+ Returns
+ -------
+ joint_dist : pandas.DataFrame
+ Joint distribution of factor loadings for the specified dimensions.
+ Rows correspond to elements in dim1 and columns to elements in dim2.
+ """
+ if hasattr(result, 'factors'):
+ result = result.factors
+ if result is None:
+ raise ValueError('A tensor factorization must be run on the tensor before calling this function.')
+ elif isinstance(result, dict):
+ pass
+ else:
+ raise ValueError('result is not of a valid type. It must be an InteractionTensor or a dict.')
+
+ assert dim1 in result.keys(), 'The specified dimension ' + dim1 + ' is not present in the `result` input'
+ assert dim2 in result.keys(), 'The specified dimension ' + dim2 + ' is not present in the `result` input'
+
+ vec1 = result[dim1][factor]
+ vec2 = result[dim2][factor]
+
+ # Calculate the outer product
+ joint_dist = pd.DataFrame(data=np.outer(vec1, vec2),
+ index=vec1.index,
+ columns=vec2.index)
+
+ joint_dist.index.name = dim1
+ joint_dist.columns.name = dim2
+ return joint_dist
+
+
@@ -3245,75 +3413,56 @@
+
get_lr_by_cell_pairs(result, lr_label, sender_label, receiver_label, order_cells_by='receivers', factor=None, cci_threshold=None, lr_threshold=None)
-
-
-
-
- Returns a dataframe containing the product loadings of a specific combination
-of ligand-receptor pair and sender-receiver pair.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- result (any Tensor class in cell2cell.tensor.tensor or a dict
) – Either a Tensor type or a dictionary which resulted from the tensor
-decomposition. If it is a dict, it should be the one in, for example,
-InteractionTensor.factors
- lr_label (str
) – Label for the dimension of the ligand-receptor pairs. Usually found in
-InteractionTensor.order_labels
- sender_label (str
) – Label for the dimension of sender cells. Usually found in
-InteractionTensor.order_labels
- receiver_label (str
) – Label for the dimension of receiver cells. Usually found in
-InteractionTensor.order_labels
- order_cells_by (str, default='receivers'
) – Order of the returned dataframe. Options are 'senders' and
-'receivers'. 'senders' means to order the dataframe in a way that
-all cell-cell pairs with a same sender cell are put next to each others.
-'receivers' means the same, but by considering the receiver cell instead.
- factor (str, default=None
) – Name of the factor to be used to compute the product loadings.
-If None, all factors will be included to compute them.
- cci_threshold (float, default=None
) – Threshold to be applied on the product loadings of the sender-cell pairs.
-If specified, only cell-cell pairs with a product loading above the
-threshold at least in one of the factors included will be included
-in the returned dataframe.
- lr_threshold (float, default=None
) – Threshold to be applied on the ligand-receptor loadings.
-If specified, only LR pairs with a loading above the
-threshold at least in one of the factors included will be included
-in the returned dataframe.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing the product loadings of a specific combination
-of ligand-receptor pair and sender-receiver pair. If the factor is specified,
-the returned dataframe will contain the product loadings of that factor.
-If the factor is not specified, the returned dataframe will contain the
-product loadings across all factors.
-
-
-
-
-
+
+
+
+
+ Returns a dataframe containing the product loadings of a specific combination
+of ligand-receptor pair and sender-receiver pair.
+Parameters
+result : any Tensor class in cell2cell.tensor.tensor or a dict
+ Either a Tensor type or a dictionary which resulted from the tensor
+ decomposition. If it is a dict, it should be the one in, for example,
+ InteractionTensor.factors
+lr_label : str
+ Label for the dimension of the ligand-receptor pairs. Usually found in
+ InteractionTensor.order_labels
+sender_label : str
+ Label for the dimension of sender cells. Usually found in
+ InteractionTensor.order_labels
+receiver_label : str
+ Label for the dimension of receiver cells. Usually found in
+ InteractionTensor.order_labels
+order_cells_by : str, default='receivers'
+ Order of the returned dataframe. Options are 'senders' and
+ 'receivers'. 'senders' means to order the dataframe in a way that
+ all cell-cell pairs with a same sender cell are put next to each others.
+ 'receivers' means the same, but by considering the receiver cell instead.
+factor : str, default=None
+ Name of the factor to be used to compute the product loadings.
+ If None, all factors will be included to compute them.
+cci_threshold : float, default=None
+ Threshold to be applied on the product loadings of the sender-cell pairs.
+ If specified, only cell-cell pairs with a product loading above the
+ threshold at least in one of the factors included will be included
+ in the returned dataframe.
+lr_threshold : float, default=None
+ Threshold to be applied on the ligand-receptor loadings.
+ If specified, only LR pairs with a loading above the
+ threshold at least in one of the factors included will be included
+ in the returned dataframe.
+Returns
+cci_lr : pandas.DataFrame
+ Dataframe containing the product loadings of a specific combination
+ of ligand-receptor pair and sender-receiver pair. If the factor is specified,
+ the returned dataframe will contain the product loadings of that factor.
+ If the factor is not specified, the returned dataframe will contain the
+ product loadings across all factors.
+
Source code in cell2cell/analysis/tensor_downstream.py
def get_lr_by_cell_pairs(result, lr_label, sender_label, receiver_label, order_cells_by='receivers', factor=None,
@@ -3446,12 +3595,12 @@
+
tensor_pipelines
-
+
@@ -3465,103 +3614,98 @@
-Functions
+
-
+
run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor=False, rank=None, tf_optimization='regular', random_state=None, backend=None, device=None, elbow_metric='error', smooth_elbow=False, upper_rank=25, tf_init='random', tf_svd='numpy_svd', cmaps=None, sample_col='Element', group_col='Category', fig_fontsize=14, output_folder=None, output_fig=True, fig_format='pdf', **kwargs)
-
+
Runs basic pipeline of Tensor-cell2cell (excluding downstream analyses).
+Parameters
+interaction_tensor : cell2cell.tensor.BaseTensor
+ A communication tensor generated with any of the tensor class in
+ cell2cell.tensor.
+tensor_metadata : list
+ List of pandas dataframes with metadata information for elements of each
+ dimension in the tensor. A column called as the variable sample_col
contains
+ the name of each element in the tensor while another column called as the
+ variable group_col
contains the metadata or grouping information of each
+ element.
+copy_tensor : boolean, default=False
+ Whether generating a copy of the original tensor to avoid modifying it.
+rank : int, default=None
+ Rank of the Tensor Factorization (number of factors to deconvolve the original
+ tensor). If None, it will automatically inferred from an elbow analysis.
+tf_optimization : str, default='regular'
+ It defines whether performing an optimization with higher number of iterations,
+ independent factorization runs, and higher resolution (lower tolerance),
+ or with lower number of iterations, factorization runs, and resolution.
+ Options are:
+- 'regular' : It uses 100 max iterations, 1 factorization run, and 10e-7 tolerance.
+ Faster to run.
+- 'robust' : It uses 500 max iterations, 100 factorization runs, and 10e-8 tolerance.
+ Slower to run.
+
+
+random_state : boolean, default=None
+ Seed for randomization.
+backend : str, default=None
+ Backend that TensorLy will use to perform calculations
+ on this tensor. When None, the default backend used is
+ the currently active backend, usually is ('numpy'). Options are:
+device : str, default=None
+ Device to use when backend allows multiple devices. Options are:
+elbow_metric : str, default='error'
+ Metric to perform the elbow analysis (y-axis).
+ - 'error' : Normalized error to compute the elbow.
+ - 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
+
+
+smooth_elbow : boolean, default=False
+ Whether smoothing the elbow-analysis curve with a Savitzky-Golay filter.
+upper_rank : int, default=25
+ Upper bound of ranks to explore with the elbow analysis.
+tf_init : str, default='random'
+ Initialization method for computing the Tensor Factorization.
+tf_svd : str, default='numpy_svd'
+ Function to compute the SVD for initializing the Tensor Factorization,
+ acceptable values in tensorly.SVD_FUNS
+cmaps : list, default=None
+ A list of colormaps used for coloring elements in each dimension. The length
+ of this list is equal to the number of dimensions of the tensor. If None, all
+ dimensions will be colores with the colormap 'gist_rainbow'.
+sample_col : str, default='Element'
+ Name of the column containing the element names in the metadata.
+group_col : str, default='Category'
+ Name of the column containing the metadata or grouping information for each
+ element in the metadata.
+fig_fontsize : int, default=14
+ Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
+output_folder : str, default=None
+ Path to the folder where the figures generated will be saved.
+ If None, figures will not be saved.
+output_fig : boolean, default=True
+ Whether generating the figures with matplotlib.
+fig_format : str, default='pdf'
+ Format to store figures when an output_folder
is specified
+ and output_fig
is True. Otherwise, this is not necessary.
+**kwargs : dict
+ Extra arguments for the tensor factorization according to inputs in
+ tensorly.
+Returns
+interaction_tensor : cell2cell.tensor.tensor.BaseTensor
+ Either the original input interaction_tensor
or a copy of it.
+ This also stores the results from running the Tensor-cell2cell
+ pipeline in the corresponding attributes.
-
-
-
-
-
-
-
- Parameters:
-
-
- interaction_tensor (cell2cell.tensor.BaseTensor
) – A communication tensor generated with any of the tensor class in
-cell2cell.tensor.
- tensor_metadata (list
) – List of pandas dataframes with metadata information for elements of each
-dimension in the tensor. A column called as the variable sample_col
contains
-the name of each element in the tensor while another column called as the
-variable group_col
contains the metadata or grouping information of each
-element.
- copy_tensor (boolean, default=False
) – Whether generating a copy of the original tensor to avoid modifying it.
- rank (int, default=None
) – Rank of the Tensor Factorization (number of factors to deconvolve the original
-tensor). If None, it will automatically inferred from an elbow analysis.
- tf_optimization (str, default='regular'
) – It defines whether performing an optimization with higher number of iterations,
-independent factorization runs, and higher resolution (lower tolerance),
-or with lower number of iterations, factorization runs, and resolution.
-Options are:
-
-- 'regular' : It uses 100 max iterations, 1 factorization run, and 10e-7 tolerance.
- Faster to run.
-- 'robust' : It uses 500 max iterations, 100 factorization runs, and 10e-8 tolerance.
- Slower to run.
-
- random_state (boolean, default=None
) – Seed for randomization.
- backend (str, default=None
) – Backend that TensorLy will use to perform calculations
-on this tensor. When None, the default backend used is
-the currently active backend, usually is ('numpy'). Options are:
- device (str, default=None
) – Device to use when backend allows multiple devices. Options are:
- elbow_metric (str, default='error'
) – Metric to perform the elbow analysis (y-axis).
-- 'error' : Normalized error to compute the elbow.
-- 'similarity' : Similarity based on CorrIndex (1-CorrIndex).
-
- smooth_elbow (boolean, default=False
) – Whether smoothing the elbow-analysis curve with a Savitzky-Golay filter.
- upper_rank (int, default=25
) – Upper bound of ranks to explore with the elbow analysis.
- tf_init (str, default='random'
) – Initialization method for computing the Tensor Factorization.
- tf_svd (str, default='numpy_svd'
) – Function to compute the SVD for initializing the Tensor Factorization,
-acceptable values in tensorly.SVD_FUNS
- cmaps (list, default=None
) – A list of colormaps used for coloring elements in each dimension. The length
-of this list is equal to the number of dimensions of the tensor. If None, all
-dimensions will be colores with the colormap 'gist_rainbow'.
- sample_col (str, default='Element'
) – Name of the column containing the element names in the metadata.
- group_col (str, default='Category'
) – Name of the column containing the metadata or grouping information for each
-element in the metadata.
- fig_fontsize (int, default=14
) – Font size of the tick labels. Axis labels will be 1.2 times the fontsize.
- output_folder (str, default=None
) – Path to the folder where the figures generated will be saved.
-If None, figures will not be saved.
- output_fig (boolean, default=True
) – Whether generating the figures with matplotlib.
- fig_format (str, default='pdf'
) – Format to store figures when an output_folder
is specified
-and output_fig
is True. Otherwise, this is not necessary.
- *kwargs* (dict
) – Extra arguments for the tensor factorization according to inputs in
-tensorly.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- cell2cell.tensor.tensor.BaseTensor
– Either the original input interaction_tensor
or a copy of it.
-This also stores the results from running the Tensor-cell2cell
-pipeline in the corresponding attributes.
-
-
-
-
-
Source code in cell2cell/analysis/tensor_pipelines.py
def run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor=False, rank=None,
@@ -3804,7 +3948,7 @@
- cell2cell.clustering
+ clustering
@@ -3813,7 +3957,7 @@
-
+
@@ -3827,18 +3971,18 @@
-Modules
+
-
+
cluster_interactions
-
+
@@ -3852,68 +3996,44 @@
-Functions
+
-
+
compute_distance(data_matrix, axis=0, metric='euclidean')
-
+
- Computes the pairwise distance between elements in a
-matrix of shape m x n. Uses the function
+
Computes the pairwise distance between elements in a
+matrix of shape m x n. Uses the function
scipy.spatial.distance.pdist
+Parameters
+data_matrix : pandas.DataFrame or ndarray
+ A m x n matrix used to compute the distances
+axis : int, default=0
+ To decide on which elements to compute the distance.
+ If axis=0, the distances will be between elements in
+ the rows, while axis=1 will lead to distances between
+ elements in the columns.
+metric : str, default='euclidean'
+ The distance metric to use. The distance function can be 'braycurtis',
+ 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
+ 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski',
+ 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
+ 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+Returns
+D : ndarray
+ Returns a condensed distance matrix Y. For each i and j (where i < j < m),
+ where m is the number of original observations. The metric
+ dist(u=X[i], v=X[j]) is computed and stored in entry
+ m * i + j - ((i + 2) * (i + 1)) // 2.
-
-
-
-
-
-
-
- Parameters:
-
-
- data_matrix (pandas.DataFrame or ndarray
) – A m x n matrix used to compute the distances
- axis (int, default=0
) – To decide on which elements to compute the distance.
-If axis=0, the distances will be between elements in
-the rows, while axis=1 will lead to distances between
-elements in the columns.
- metric (str, default='euclidean'
) – The distance metric to use. The distance function can be 'braycurtis',
-'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
-'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski',
-'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
-'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- ndarray
– Returns a condensed distance matrix Y. For each i and j (where i < j < m),
-where m is the number of original observations. The metric
-dist(u=X[i], v=X[j]) is computed and stored in entry
-m * i + j - ((i + 2) * (i + 1)) // 2.
-
-
-
-
-
Source code in cell2cell/clustering/cluster_interactions.py
def compute_distance(data_matrix, axis=0, metric='euclidean'):
@@ -3970,64 +4090,40 @@
+
compute_linkage(distance_matrix, method='ward', optimal_ordering=True)
-
+
Returns a linkage for a given distance matrix using a specific method.
+Parameters
+distance_matrix : numpy.ndarray
+ A square array containing the distance between a given row and a
+ given column. Diagonal elements must be zero.
+method : str, 'ward' by default
+ Method to compute the linkage. It could be:
+- 'single'
+- 'complete'
+- 'average'
+- 'weighted'
+- 'centroid'
+- 'median'
+- 'ward'
+For more details, go to:
+https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
+
+
+optimal_ordering : boolean, default=True
+ Whether sorting the leaf of the dendrograms to have a minimal distance
+ between successive leaves. For more information, see
+ scipy.cluster.hierarchy.optimal_leaf_ordering
+Returns
+Z : numpy.ndarray
+ The hierarchical clustering encoded as a linkage matrix.
-
-
-
-
-
-
-
- Parameters:
-
-
- distance_matrix (numpy.ndarray
) – A square array containing the distance between a given row and a
-given column. Diagonal elements must be zero.
- method (str, 'ward' by default
) – Method to compute the linkage. It could be:
-
-- 'single'
-- 'complete'
-- 'average'
-- 'weighted'
-- 'centroid'
-- 'median'
-- 'ward'
-For more details, go to:
-https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.linkage.html
-
- optimal_ordering (boolean, default=True
) – Whether sorting the leaf of the dendrograms to have a minimal distance
-between successive leaves. For more information, see
-scipy.cluster.hierarchy.optimal_leaf_ordering
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.ndarray
– The hierarchical clustering encoded as a linkage matrix.
-
-
-
-
-
Source code in cell2cell/clustering/cluster_interactions.py
def compute_linkage(distance_matrix, method='ward', optimal_ordering=True):
@@ -4088,57 +4184,34 @@
+
get_clusters_from_linkage(linkage, threshold, criterion='maxclust', labels=None)
-
+
Gets clusters from a linkage given a threshold and a criterion.
+Parameters
+linkage : numpy.ndarray
+ The hierarchical clustering encoded with the matrix returned by
+ the linkage function (Z).
+threshold : float
+ The threshold to apply when forming flat clusters.
+criterion : str, 'maxclust' by default
+ The criterion to use in forming flat clusters. Depending on the
+ criterion, the threshold has different meanings. More information on:
+ https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.fcluster.html
+labels : array-like, None by default
+ List of labels of the elements contained in the linkage. The order
+ must match the order they were provided when generating the linkage.
+Returns
+clusters : dict
+ A dictionary containing the clusters obtained. The keys correspond to
+ the cluster numbers and the vaues to a list with element names given the
+ labels, or the element index based on the linkage.
-
-
-
-
-
-
-
- Parameters:
-
-
- linkage (numpy.ndarray
) – The hierarchical clustering encoded with the matrix returned by
-the linkage function (Z).
- threshold (float
) – The threshold to apply when forming flat clusters.
- criterion (str, 'maxclust' by default
) – The criterion to use in forming flat clusters. Depending on the
-criterion, the threshold has different meanings. More information on:
-https://docs.scipy.org/doc/scipy-0.19.0/reference/generated/scipy.cluster.hierarchy.fcluster.html
- labels (array-like, None by default
) – List of labels of the elements contained in the linkage. The order
-must match the order they were provided when generating the linkage.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– A dictionary containing the clusters obtained. The keys correspond to
-the cluster numbers and the vaues to a list with element names given the
-labels, or the element index based on the linkage.
-
-
-
-
-
Source code in cell2cell/clustering/cluster_interactions.py
def get_clusters_from_linkage(linkage, threshold, criterion='maxclust', labels=None):
@@ -4215,7 +4288,7 @@
- cell2cell.core
+ core
@@ -4224,7 +4297,7 @@
-
+
@@ -4238,18 +4311,18 @@
-Modules
+
-
+
cci_scores
-
+
@@ -4263,173 +4336,38 @@
-Functions
-
-
-
-
-
-
-compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None)
-
-
-
-
-
-
- Calculates a Jaccard-like score for the interaction between
-two cells based on their intercellular protein-protein
-interactions such as ligand-receptor interactions.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the receiver.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- float
– Overall score for the interaction between a pair of
-cell-types/tissues/samples. In this case it is a
-Jaccard-like score.
-
-
-
-
-
-
- Source code in cell2cell/core/cci_scores.py
- def compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None):
- '''Calculates a Jaccard-like score for the interaction between
- two cells based on their intercellular protein-protein
- interactions such as ligand-receptor interactions.
-
- Parameters
- ----------
- cell1 : cell2cell.core.cell.Cell
- First cell-type/tissue/sample to compute interaction
- between a pair of them. In a directed interaction,
- this is the sender.
-
- cell2 : cell2cell.core.cell.Cell
- Second cell-type/tissue/sample to compute interaction
- between a pair of them. In a directed interaction,
- this is the receiver.
-
- Returns
- -------
- cci_score : float
- Overall score for the interaction between a pair of
- cell-types/tissues/samples. In this case it is a
- Jaccard-like score.
- '''
- c1 = cell1.weighted_ppi['A'].values
- c2 = cell2.weighted_ppi['B'].values
-
- if (len(c1) == 0) or (len(c2) == 0):
- return 0.0
-
- if ppi_score is None:
- ppi_score = np.array([1.0] * len(c1))
-
- # Extended Jaccard similarity
- numerator = np.nansum(c1 * c2 * ppi_score)
- denominator = np.nansum(c1 * c1 * ppi_score) + np.nansum(c2 * c2 * ppi_score) - numerator
-
- if denominator == 0.0:
- return 0.0
-
- cci_score = numerator / denominator
-
- if cci_score is np.nan:
- return 0.0
- return cci_score
-
-
-
-
-
-
-
+
compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=None)
-
+
- Calculates a Bray-Curtis-like score for the interaction between
-two cells based on their intercellular protein-protein
+
Calculates a Bray-Curtis-like score for the interaction between
+two cells based on their intercellular protein-protein
interactions such as ligand-receptor interactions.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the receiver.
+Returns
+cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples. In this case is a
+ Bray-Curtis-like score.
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the receiver.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- float
– Overall score for the interaction between a pair of
-cell-types/tissues/samples. In this case is a
-Bray-Curtis-like score.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
def compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=None):
@@ -4489,56 +4427,31 @@
+
compute_count_score(cell1, cell2, ppi_score=None)
-
+
- Calculates the number of active protein-protein interactions
-for the interaction between two cells, which could be the number
+
Calculates the number of active protein-protein interactions
+for the interaction between two cells, which could be the number
of active ligand-receptor interactions.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the receiver.
+Returns
+cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples.
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the receiver.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- float
– Overall score for the interaction between a pair of
-cell-types/tissues/samples.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
def compute_count_score(cell1, cell2, ppi_score=None):
@@ -4591,55 +4504,30 @@
-
+
compute_icellnet_score(cell1, cell2, ppi_score=None)
-
+
- Calculates the sum of communication scores
-for the interaction between two cells. Based on ICELLNET.
+ Calculates the sum of communication scores
+for the interaction between two cells. Based on ICELLNET.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the receiver.
+Returns
+cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples.
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute interaction
-between a pair of them. In a directed interaction,
-this is the receiver.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- float
– Overall score for the interaction between a pair of
-cell-types/tissues/samples.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
def compute_icellnet_score(cell1, cell2, ppi_score=None):
@@ -4691,95 +4579,79 @@
-matmul_jaccard_like(A_scores, B_scores, ppi_score=None)
+
+compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None)
-
+
- Computes Jaccard-like scores using matrices of proteins by
-cell-types/tissues/samples.
+ Calculates a Jaccard-like score for the interaction between
+two cells based on their intercellular protein-protein
+interactions such as ligand-receptor interactions.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the receiver.
+Returns
+cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples. In this case it is a
+ Jaccard-like score.
-
-
-
-
-
-
-
- Parameters:
-
-
- A_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
- B_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– Matrix MxM, representing the CCI score for all pairs of
-cell-types/tissues/samples. In directed interactions,
-the vertical axis (axis 0) represents the senders, while
-the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
- def matmul_jaccard_like(A_scores, B_scores, ppi_score=None):
- '''Computes Jaccard-like scores using matrices of proteins by
- cell-types/tissues/samples.
-
- Parameters
- ----------
- A_scores : array-like
- Matrix of size NxM, where N are the proteins in the first
- column of a list of PPIs and M are the
- cell-types/tissues/samples.
-
- B_scores : array-like
- Matrix of size NxM, where N are the proteins in the first
- column of a list of PPIs and M are the
- cell-types/tissues/samples.
-
- Returns
- -------
- jaccard : numpy.array
- Matrix MxM, representing the CCI score for all pairs of
- cell-types/tissues/samples. In directed interactions,
- the vertical axis (axis 0) represents the senders, while
- the horizontal axis (axis 1) represents the receivers.
+ def compute_jaccard_like_cci_score(cell1, cell2, ppi_score=None):
+ '''Calculates a Jaccard-like score for the interaction between
+ two cells based on their intercellular protein-protein
+ interactions such as ligand-receptor interactions.
+
+ Parameters
+ ----------
+ cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the sender.
+
+ cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute interaction
+ between a pair of them. In a directed interaction,
+ this is the receiver.
+
+ Returns
+ -------
+ cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples. In this case it is a
+ Jaccard-like score.
'''
- if ppi_score is None:
- ppi_score = np.array([1.0] * A_scores.shape[0])
- ppi_score = ppi_score.reshape((len(ppi_score), 1))
-
- numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
+ c1 = cell1.weighted_ppi['A'].values
+ c2 = cell2.weighted_ppi['B'].values
+
+ if (len(c1) == 0) or (len(c2) == 0):
+ return 0.0
- A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0)
- B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0)
- denominator = A_module.reshape((A_module.shape[0], 1)) + B_module - numerator
-
- jaccard = np.divide(numerator, denominator)
- return jaccard
+ if ppi_score is None:
+ ppi_score = np.array([1.0] * len(c1))
+
+ # Extended Jaccard similarity
+ numerator = np.nansum(c1 * c2 * ppi_score)
+ denominator = np.nansum(c1 * c1 * ppi_score) + np.nansum(c2 * c2 * ppi_score) - numerator
+
+ if denominator == 0.0:
+ return 0.0
+
+ cci_score = numerator / denominator
+
+ if cci_score is np.nan:
+ return 0.0
+ return cci_score
@@ -4792,57 +4664,32 @@
-
+
matmul_bray_curtis_like(A_scores, B_scores, ppi_score=None)
-
+
- Computes Bray-Curtis-like scores using matrices of proteins by
-cell-types/tissues/samples.
+ Computes Bray-Curtis-like scores using matrices of proteins by
+cell-types/tissues/samples.
+Parameters
+A_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+B_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+Returns
+bray_curtis : numpy.array
+ Matrix MxM, representing the CCI score for all pairs of
+ cell-types/tissues/samples. In directed interactions,
+ the vertical axis (axis 0) represents the senders, while
+ the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
-
- Parameters:
-
-
- A_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
- B_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– Matrix MxM, representing the CCI score for all pairs of
-cell-types/tissues/samples. In directed interactions,
-the vertical axis (axis 0) represents the senders, while
-the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
def matmul_bray_curtis_like(A_scores, B_scores, ppi_score=None):
@@ -4893,58 +4740,109 @@
+
+matmul_cosine(A_scores, B_scores, ppi_score=None)
+
+
+
+
+
+
+ Computes cosine-similarity scores using matrices of proteins by
+cell-types/tissues/samples.
+Parameters
+A_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+B_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+Returns
+cosine : numpy.array
+ Matrix MxM, representing the CCI score for all pairs of
+ cell-types/tissues/samples. In directed interactions,
+ the vertical axis (axis 0) represents the senders, while
+ the horizontal axis (axis 1) represents the receivers.
+
+
+ Source code in cell2cell/core/cci_scores.py
+ def matmul_cosine(A_scores, B_scores, ppi_score=None):
+ '''Computes cosine-similarity scores using matrices of proteins by
+ cell-types/tissues/samples.
+
+ Parameters
+ ----------
+ A_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+
+ B_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+
+ Returns
+ -------
+ cosine : numpy.array
+ Matrix MxM, representing the CCI score for all pairs of
+ cell-types/tissues/samples. In directed interactions,
+ the vertical axis (axis 0) represents the senders, while
+ the horizontal axis (axis 1) represents the receivers.
+ '''
+ if ppi_score is None:
+ ppi_score = np.array([1.0] * A_scores.shape[0])
+ ppi_score = ppi_score.reshape((len(ppi_score), 1))
+
+ numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
+
+ A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0) ** 0.5
+ B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0) ** 0.5
+ denominator = A_module.reshape((A_module.shape[0], 1)) * B_module
+
+ cosine = np.divide(numerator, denominator)
+ return cosine
+
+
+
+
+
+
+
+
+
+
+
+
+
matmul_count_active(A_scores, B_scores, ppi_score=None)
-
+
- Computes the count of active protein-protein interactions
-used for intercellular communication using matrices of proteins by
+
Computes the count of active protein-protein interactions
+used for intercellular communication using matrices of proteins by
cell-types/tissues/samples.
+Parameters
+A_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+B_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+Returns
+counts : numpy.array
+ Matrix MxM, representing the CCI score for all pairs of
+ cell-types/tissues/samples. In directed interactions,
+ the vertical axis (axis 0) represents the senders, while
+ the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
-
- Parameters:
-
-
- A_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
- B_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– Matrix MxM, representing the CCI score for all pairs of
-cell-types/tissues/samples. In directed interactions,
-the vertical axis (axis 0) represents the senders, while
-the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
def matmul_count_active(A_scores, B_scores, ppi_score=None):
@@ -4990,61 +4888,36 @@
-
-matmul_cosine(A_scores, B_scores, ppi_score=None)
+
+matmul_jaccard_like(A_scores, B_scores, ppi_score=None)
-
+
- Computes cosine-similarity scores using matrices of proteins by
-cell-types/tissues/samples.
+ Computes Jaccard-like scores using matrices of proteins by
+cell-types/tissues/samples.
+Parameters
+A_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+B_scores : array-like
+ Matrix of size NxM, where N are the proteins in the first
+ column of a list of PPIs and M are the
+ cell-types/tissues/samples.
+Returns
+jaccard : numpy.array
+ Matrix MxM, representing the CCI score for all pairs of
+ cell-types/tissues/samples. In directed interactions,
+ the vertical axis (axis 0) represents the senders, while
+ the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
-
- Parameters:
-
-
- A_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
- B_scores (array-like
) – Matrix of size NxM, where N are the proteins in the first
-column of a list of PPIs and M are the
-cell-types/tissues/samples.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– Matrix MxM, representing the CCI score for all pairs of
-cell-types/tissues/samples. In directed interactions,
-the vertical axis (axis 0) represents the senders, while
-the horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
Source code in cell2cell/core/cci_scores.py
- def matmul_cosine(A_scores, B_scores, ppi_score=None):
- '''Computes cosine-similarity scores using matrices of proteins by
+ def matmul_jaccard_like(A_scores, B_scores, ppi_score=None):
+ '''Computes Jaccard-like scores using matrices of proteins by
cell-types/tissues/samples.
Parameters
@@ -5061,7 +4934,7 @@
Returns
-------
- cosine : numpy.array
+ jaccard : numpy.array
Matrix MxM, representing the CCI score for all pairs of
cell-types/tissues/samples. In directed interactions,
the vertical axis (axis 0) represents the senders, while
@@ -5073,12 +4946,12 @@
numerator = np.matmul(np.multiply(A_scores, ppi_score).transpose(), B_scores)
- A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0) ** 0.5
- B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0) ** 0.5
- denominator = A_module.reshape((A_module.shape[0], 1)) * B_module
+ A_module = np.sum(np.multiply(np.multiply(A_scores, A_scores), ppi_score), axis=0)
+ B_module = np.sum(np.multiply(np.multiply(B_scores, B_scores), ppi_score), axis=0)
+ denominator = A_module.reshape((A_module.shape[0], 1)) + B_module - numerator
- cosine = np.divide(numerator, denominator)
- return cosine
+ jaccard = np.divide(numerator, denominator)
+ return jaccard
@@ -5102,12 +4975,12 @@
-
+
cell
-
+
@@ -5120,79 +4993,44 @@
-Classes
+
-
+
Cell
-
+
Specific cell-type/tissue/organ element in a RNAseq dataset.
+Parameters
+sc_rnaseq_data : pandas.DataFrame
+ A gene expression matrix. Contains only one column that
+ corresponds to cell-type/tissue/sample, while the genes
+ are rows and the specific. Column name will be the label
+ of the instance.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Attributes
+id : int
+ ID number of the instance generated.
+type : str
+ Name of the respective cell-type/tissue/sample.
+rnaseq_data : pandas.DataFrame
+ Copy of sc_rnaseq_data.
+weighted_ppi : pandas.DataFrame
+ Dataframe created from a list of protein-protein interactions,
+ here the columns of the interacting proteins are replaced by
+ a score or a preprocessed gene expression of the respective
+ proteins.
-
-
-
-
-
-
-
- Parameters:
-
-
- sc_rnaseq_data (pandas.DataFrame
) – A gene expression matrix. Contains only one column that
-corresponds to cell-type/tissue/sample, while the genes
-are rows and the specific. Column name will be the label
-of the instance.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-Attributes:
-
-
-
- Name
- Type
- Description
-
-
-
-
- id
- int
- ID number of the instance generated.
-
-
- type
- str
- Name of the respective cell-type/tissue/sample.
-
-
- rnaseq_data
- pandas.DataFrame
- Copy of sc_rnaseq_data.
-
-
- weighted_ppi
- pandas.DataFrame
- Dataframe created from a list of protein-protein interactions,
-here the columns of the interacting proteins are replaced by
-a score or a preprocessed gene expression of the respective
-proteins.
-
-
-
-Functions
+
-
+
get_cells_from_rnaseq(rnaseq_data, cell_columns=None, verbose=True)
-
+
- Creates new instances of Cell based on the RNAseq data of each
-cell-type/tissue/sample in a gene expression matrix.
+ Creates new instances of Cell based on the RNAseq data of each
+cell-type/tissue/sample in a gene expression matrix.
+Parameters
+rnaseq_data : pandas.DataFrame
+ Gene expression data for a RNA-seq experiment. Columns are
+ cell-types/tissues/samples and rows are genes.
+cell_columns : array-like, default=None
+ List of names of cell-types/tissues/samples in the dataset
+ to be used. If None, all columns will be used.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+cells : dict
+ Dictionary containing all Cell instances generated from a RNAseq dataset.
+ The keys of this dictionary are the names of the corresponding Cell instances.
-
-
-
-
-
-
-
- Parameters:
-
-
- rnaseq_data (pandas.DataFrame
) – Gene expression data for a RNA-seq experiment. Columns are
-cell-types/tissues/samples and rows are genes.
- cell_columns (array-like, default=None
) – List of names of cell-types/tissues/samples in the dataset
-to be used. If None, all columns will be used.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– Dictionary containing all Cell instances generated from a RNAseq dataset.
-The keys of this dictionary are the names of the corresponding Cell instances.
-
-
-
-
-
Source code in cell2cell/core/cell.py
def get_cells_from_rnaseq(rnaseq_data, cell_columns=None, verbose=True):
@@ -5375,12 +5236,12 @@
-
+
communication_scores
-
+
@@ -5394,192 +5255,327 @@
-Functions
+
-
-get_binary_scores(cell1, cell2, ppi_score=None)
+
+aggregate_ccc_matrices(ccc_matrices, method='gmean')
-
+
- Computes binary communication scores for all
-protein-protein interactions between a pair of
-cell-types/tissues/samples. This corresponds to
-an AND function between binary values for each
-interacting protein coming from each cell.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the receiver.
- ppi_score (array-like, default=None
) – An array with a weight for each PPI. The weight
-multiplies the communication scores.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– An array with the communication scores for each intercellular
-PPI.
-
-
-
-
-
-
- Source code in cell2cell/core/communication_scores.py
- def get_binary_scores(cell1, cell2, ppi_score=None):
- '''Computes binary communication scores for all
- protein-protein interactions between a pair of
- cell-types/tissues/samples. This corresponds to
- an AND function between binary values for each
- interacting protein coming from each cell.
-
- Parameters
- ----------
- cell1 : cell2cell.core.cell.Cell
- First cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the sender.
-
- cell2 : cell2cell.core.cell.Cell
- Second cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the receiver.
-
- ppi_score : array-like, default=None
- An array with a weight for each PPI. The weight
- multiplies the communication scores.
-
- Returns
- -------
- communication_scores : numpy.array
- An array with the communication scores for each intercellular
- PPI.
- '''
- c1 = cell1.weighted_ppi['A'].values
- c2 = cell2.weighted_ppi['B'].values
-
- if (len(c1) == 0) or (len(c2) == 0):
- return 0.0
-
- if ppi_score is None:
- ppi_score = np.array([1.0] * len(c1))
-
- communication_scores = c1 * c2 * ppi_score
- return communication_scores
+ Aggregates matrices of communication scores. Each
+matrix has the communication scores across all pairs
+of cell-types/tissues/samples for a different
+pair of interacting proteins.
+Parameters
+ccc_matrices : list
+ List of matrices of communication scores. Each matrix
+ is for an specific pair of interacting proteins.
+method : str, default='gmean'.
+ Method to aggregate the matrices element-wise.
+ Options are:
+- 'gmean' : Geometric mean in an element-wise way.
+- 'sum' : Sum in an element-wise way.
+- 'mean' : Mean in an element-wise way.
-
-
-
-
-
-
-
-
-
-
-
-
-get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product')
-
-
-
-
-
- Computes continuous communication scores for all
-protein-protein interactions between a pair of
-cell-types/tissues/samples. This corresponds to
-a specific scoring function between preprocessed continuous
-expression values for each interacting protein coming from
-each cell.
+Returns
+aggregated_ccc_matrix : numpy.array
+ A matrix contiaining aggregated communication scores
+ from multiple PPIs. It's shape is of MxM, where M are all
+ cell-types/tissues/samples. In directed interactions, the
+ vertical axis (axis 0) represents the senders, while the
+ horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the receiver.
- ppi_score (array-like, default=None
) – An array with a weight for each PPI. The weight
-multiplies the communication scores.
- method (str, default='expression_product'
) – Scoring function for computing the communication score.
-Options are:
- - 'expression_product' : Multiplication between the expression
- of the interacting proteins. One coming from cell1 and the
- other from cell2.
- - 'expression_mean' : Average between the expression
- of the interacting proteins. One coming from cell1 and the
- other from cell2.
- - 'expression_gmean' : Geometric mean between the expression
- of the interacting proteins. One coming from cell1 and the
- other from cell2.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– An array with the communication scores for each intercellular
-PPI.
-
-
-
-
-
Source code in cell2cell/core/communication_scores.py
- def get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product'):
- '''Computes continuous communication scores for all
- protein-protein interactions between a pair of
- cell-types/tissues/samples. This corresponds to
- a specific scoring function between preprocessed continuous
- expression values for each interacting protein coming from
- each cell.
-
- Parameters
- ----------
+ def aggregate_ccc_matrices(ccc_matrices, method='gmean'):
+ '''Aggregates matrices of communication scores. Each
+ matrix has the communication scores across all pairs
+ of cell-types/tissues/samples for a different
+ pair of interacting proteins.
+
+ Parameters
+ ----------
+ ccc_matrices : list
+ List of matrices of communication scores. Each matrix
+ is for an specific pair of interacting proteins.
+
+ method : str, default='gmean'.
+ Method to aggregate the matrices element-wise.
+ Options are:
+
+ - 'gmean' : Geometric mean in an element-wise way.
+ - 'sum' : Sum in an element-wise way.
+ - 'mean' : Mean in an element-wise way.
+
+ Returns
+ -------
+ aggregated_ccc_matrix : numpy.array
+ A matrix contiaining aggregated communication scores
+ from multiple PPIs. It's shape is of MxM, where M are all
+ cell-types/tissues/samples. In directed interactions, the
+ vertical axis (axis 0) represents the senders, while the
+ horizontal axis (axis 1) represents the receivers.
+ '''
+ if method == 'gmean':
+ aggregated_ccc_matrix = gmean(ccc_matrices)
+ elif method == 'sum':
+ aggregated_ccc_matrix = np.nansum(ccc_matrices, axis=0)
+ elif method == 'mean':
+ aggregated_ccc_matrix = np.nanmean(ccc_matrices, axis=0)
+ else:
+ raise ValueError("Not a valid method")
+
+ return aggregated_ccc_matrix
+
+
+
+
+
+
+
+
+
+
+
+
+
+compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product')
+
+
+
+
+
+
+ Computes communication scores for an specific
+protein-protein interaction using vectors of gene expression
+levels for a given interacting protein produced by
+different cell-types/tissues/samples.
+Parameters
+prot_a_exp : array-like
+ Vector with gene expression levels for an interacting protein A
+ in a given PPI. Coordinates are different cell-types/tissues/samples.
+prot_b_exp : array-like
+ Vector with gene expression levels for an interacting protein B
+ in a given PPI. Coordinates are different cell-types/tissues/samples.
+communication_score : str, default='expression_product'
+ Scoring function for computing the communication score.
+ Options are:
+- 'expression_product' : Multiplication between the expression
+ of the interacting proteins.
+- 'expression_mean' : Average between the expression
+ of the interacting proteins.
+- 'expression_gmean' : Geometric mean between the expression
+ of the interacting proteins.
+
+
+Returns
+communication_scores : numpy.array
+ Matrix MxM, representing the CCC scores of an specific PPI
+ across all pairs of cell-types/tissues/samples. M are all
+ cell-types/tissues/samples. In directed interactions, the
+ vertical axis (axis 0) represents the senders, while the
+ horizontal axis (axis 1) represents the receivers.
+
+
+ Source code in cell2cell/core/communication_scores.py
+ def compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product'):
+ '''Computes communication scores for an specific
+ protein-protein interaction using vectors of gene expression
+ levels for a given interacting protein produced by
+ different cell-types/tissues/samples.
+
+ Parameters
+ ----------
+ prot_a_exp : array-like
+ Vector with gene expression levels for an interacting protein A
+ in a given PPI. Coordinates are different cell-types/tissues/samples.
+
+ prot_b_exp : array-like
+ Vector with gene expression levels for an interacting protein B
+ in a given PPI. Coordinates are different cell-types/tissues/samples.
+
+ communication_score : str, default='expression_product'
+ Scoring function for computing the communication score.
+ Options are:
+
+ - 'expression_product' : Multiplication between the expression
+ of the interacting proteins.
+ - 'expression_mean' : Average between the expression
+ of the interacting proteins.
+ - 'expression_gmean' : Geometric mean between the expression
+ of the interacting proteins.
+
+ Returns
+ -------
+ communication_scores : numpy.array
+ Matrix MxM, representing the CCC scores of an specific PPI
+ across all pairs of cell-types/tissues/samples. M are all
+ cell-types/tissues/samples. In directed interactions, the
+ vertical axis (axis 0) represents the senders, while the
+ horizontal axis (axis 1) represents the receivers.
+ '''
+ if communication_score == 'expression_product':
+ communication_scores = np.outer(prot_a_exp, prot_b_exp)
+ elif communication_score == 'expression_mean':
+ communication_scores = (np.outer(prot_a_exp, np.ones(prot_b_exp.shape)) + np.outer(np.ones(prot_a_exp.shape), prot_b_exp)) / 2.
+ elif communication_score == 'expression_gmean':
+ communication_scores = np.sqrt(np.outer(prot_a_exp, prot_b_exp))
+ else:
+ raise ValueError("Not a valid communication_score")
+ return communication_scores
+
+
+
+
+
+
+
+
+
+
+
+
+
+get_binary_scores(cell1, cell2, ppi_score=None)
+
+
+
+
+
+
+ Computes binary communication scores for all
+protein-protein interactions between a pair of
+cell-types/tissues/samples. This corresponds to
+an AND function between binary values for each
+interacting protein coming from each cell.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+ppi_score : array-like, default=None
+ An array with a weight for each PPI. The weight
+ multiplies the communication scores.
+Returns
+communication_scores : numpy.array
+ An array with the communication scores for each intercellular
+ PPI.
+
+
+ Source code in cell2cell/core/communication_scores.py
+ def get_binary_scores(cell1, cell2, ppi_score=None):
+ '''Computes binary communication scores for all
+ protein-protein interactions between a pair of
+ cell-types/tissues/samples. This corresponds to
+ an AND function between binary values for each
+ interacting protein coming from each cell.
+
+ Parameters
+ ----------
+ cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+
+ cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+
+ ppi_score : array-like, default=None
+ An array with a weight for each PPI. The weight
+ multiplies the communication scores.
+
+ Returns
+ -------
+ communication_scores : numpy.array
+ An array with the communication scores for each intercellular
+ PPI.
+ '''
+ c1 = cell1.weighted_ppi['A'].values
+ c2 = cell2.weighted_ppi['B'].values
+
+ if (len(c1) == 0) or (len(c2) == 0):
+ return 0.0
+
+ if ppi_score is None:
+ ppi_score = np.array([1.0] * len(c1))
+
+ communication_scores = c1 * c2 * ppi_score
+ return communication_scores
+
+
+
+
+
+
+
+
+
+
+
+
+
+get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product')
+
+
+
+
+
+
+ Computes continuous communication scores for all
+protein-protein interactions between a pair of
+cell-types/tissues/samples. This corresponds to
+a specific scoring function between preprocessed continuous
+expression values for each interacting protein coming from
+each cell.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+ppi_score : array-like, default=None
+ An array with a weight for each PPI. The weight
+ multiplies the communication scores.
+method : str, default='expression_product'
+ Scoring function for computing the communication score.
+ Options are:
+ - 'expression_product' : Multiplication between the expression
+ of the interacting proteins. One coming from cell1 and the
+ other from cell2.
+ - 'expression_mean' : Average between the expression
+ of the interacting proteins. One coming from cell1 and the
+ other from cell2.
+ - 'expression_gmean' : Geometric mean between the expression
+ of the interacting proteins. One coming from cell1 and the
+ other from cell2.
+Returns
+communication_scores : numpy.array
+ An array with the communication scores for each intercellular
+ PPI.
+
+
+ Source code in cell2cell/core/communication_scores.py
+ def get_continuous_scores(cell1, cell2, ppi_score=None, method='expression_product'):
+ '''Computes continuous communication scores for all
+ protein-protein interactions between a pair of
+ cell-types/tissues/samples. This corresponds to
+ a specific scoring function between preprocessed continuous
+ expression values for each interacting protein coming from
+ each cell.
+
+ Parameters
+ ----------
cell1 : cell2cell.core.cell.Cell
First cell-type/tissue/sample to compute the communication
score. In a directed interaction, this is the sender.
@@ -5640,56 +5636,31 @@
-score_expression_product(c1, c2)
+
+score_expression_mean(c1, c2)
-
+
Computes the expression product score
+Parameters
+c1 : array-like
+ A 1D-array containing the preprocessed expression values
+ for the interactors in the first column of a list of
+ protein-protein interactions.
+c2 : array-like
+ A 1D-array containing the preprocessed expression values
+ for the interactors in the second column of a list of
+ protein-protein interactions.
+Returns
+(c1 + c2)/2. : array-like
+ Average of vectors.
-
-
-
-
-
-
-
- Parameters:
-
-
- c1 (array-like
) – A 1D-array containing the preprocessed expression values
-for the interactors in the first column of a list of
-protein-protein interactions.
- c2 (array-like
) – A 1D-array containing the preprocessed expression values
-for the interactors in the second column of a list of
-protein-protein interactions.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- array-like
– Multiplication of vectors.
-
-
-
-
-
Source code in cell2cell/core/communication_scores.py
- def score_expression_product(c1, c2):
+ def score_expression_mean(c1, c2):
'''Computes the expression product score
Parameters
@@ -5706,12 +5677,12 @@
Returns
-------
- c1 * c2 : array-like
- Multiplication of vectors.
+ (c1 + c2)/2. : array-like
+ Average of vectors.
'''
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
- return c1 * c2
+ return (c1 + c2)/2.
@@ -5724,56 +5695,31 @@
-score_expression_mean(c1, c2)
+
+score_expression_product(c1, c2)
-
+
Computes the expression product score
+Parameters
+c1 : array-like
+ A 1D-array containing the preprocessed expression values
+ for the interactors in the first column of a list of
+ protein-protein interactions.
+c2 : array-like
+ A 1D-array containing the preprocessed expression values
+ for the interactors in the second column of a list of
+ protein-protein interactions.
+Returns
+c1 * c2 : array-like
+ Multiplication of vectors.
-
-
-
-
-
-
-
- Parameters:
-
-
- c1 (array-like
) – A 1D-array containing the preprocessed expression values
-for the interactors in the first column of a list of
-protein-protein interactions.
- c2 (array-like
) – A 1D-array containing the preprocessed expression values
-for the interactors in the second column of a list of
-protein-protein interactions.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- array-like
– Average of vectors.
-
-
-
-
-
Source code in cell2cell/core/communication_scores.py
- def score_expression_mean(c1, c2):
+ def score_expression_product(c1, c2):
'''Computes the expression product score
Parameters
@@ -5790,12 +5736,12 @@
Returns
-------
- (c1 + c2)/2. : array-like
- Average of vectors.
+ c1 * c2 : array-like
+ Multiplication of vectors.
'''
if (len(c1) == 0) or (len(c2) == 0):
return 0.0
- return (c1 + c2)/2.
+ return c1 * c2
@@ -5804,254 +5750,48 @@
-
-compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product')
+
+
+
+
+
+
+
+
+
+
+
+
+
+ interaction_space
-
+
+
- Computes communication scores for an specific
-protein-protein interaction using vectors of gene expression
-levels for a given interacting protein produced by
-different cell-types/tissues/samples.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- prot_a_exp (array-like
) – Vector with gene expression levels for an interacting protein A
-in a given PPI. Coordinates are different cell-types/tissues/samples.
- prot_b_exp (array-like
) – Vector with gene expression levels for an interacting protein B
-in a given PPI. Coordinates are different cell-types/tissues/samples.
- communication_score (str, default='expression_product'
) – Scoring function for computing the communication score.
-Options are:
-
-- 'expression_product' : Multiplication between the expression
- of the interacting proteins.
-- 'expression_mean' : Average between the expression
- of the interacting proteins.
-- 'expression_gmean' : Geometric mean between the expression
- of the interacting proteins.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– Matrix MxM, representing the CCC scores of an specific PPI
-across all pairs of cell-types/tissues/samples. M are all
-cell-types/tissues/samples. In directed interactions, the
-vertical axis (axis 0) represents the senders, while the
-horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
- Source code in cell2cell/core/communication_scores.py
- def compute_ccc_matrix(prot_a_exp, prot_b_exp, communication_score='expression_product'):
- '''Computes communication scores for an specific
- protein-protein interaction using vectors of gene expression
- levels for a given interacting protein produced by
- different cell-types/tissues/samples.
-
- Parameters
- ----------
- prot_a_exp : array-like
- Vector with gene expression levels for an interacting protein A
- in a given PPI. Coordinates are different cell-types/tissues/samples.
-
- prot_b_exp : array-like
- Vector with gene expression levels for an interacting protein B
- in a given PPI. Coordinates are different cell-types/tissues/samples.
-
- communication_score : str, default='expression_product'
- Scoring function for computing the communication score.
- Options are:
-
- - 'expression_product' : Multiplication between the expression
- of the interacting proteins.
- - 'expression_mean' : Average between the expression
- of the interacting proteins.
- - 'expression_gmean' : Geometric mean between the expression
- of the interacting proteins.
-
- Returns
- -------
- communication_scores : numpy.array
- Matrix MxM, representing the CCC scores of an specific PPI
- across all pairs of cell-types/tissues/samples. M are all
- cell-types/tissues/samples. In directed interactions, the
- vertical axis (axis 0) represents the senders, while the
- horizontal axis (axis 1) represents the receivers.
- '''
- if communication_score == 'expression_product':
- communication_scores = np.outer(prot_a_exp, prot_b_exp)
- elif communication_score == 'expression_mean':
- communication_scores = (np.outer(prot_a_exp, np.ones(prot_b_exp.shape)) + np.outer(np.ones(prot_a_exp.shape), prot_b_exp)) / 2.
- elif communication_score == 'expression_gmean':
- communication_scores = np.sqrt(np.outer(prot_a_exp, prot_b_exp))
- else:
- raise ValueError("Not a valid communication_score")
- return communication_scores
-
-
-
-
-
-
-
-
-
-
-
-
-
-aggregate_ccc_matrices(ccc_matrices, method='gmean')
-
-
-
-
-
-
- Aggregates matrices of communication scores. Each
-matrix has the communication scores across all pairs
-of cell-types/tissues/samples for a different
-pair of interacting proteins.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- ccc_matrices (list
) – List of matrices of communication scores. Each matrix
-is for an specific pair of interacting proteins.
- method (str, default='gmean'.
) – Method to aggregate the matrices element-wise.
-Options are:
-
-- 'gmean' : Geometric mean in an element-wise way.
-- 'sum' : Sum in an element-wise way.
-- 'mean' : Mean in an element-wise way.
-
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– A matrix contiaining aggregated communication scores
-from multiple PPIs. It's shape is of MxM, where M are all
-cell-types/tissues/samples. In directed interactions, the
-vertical axis (axis 0) represents the senders, while the
-horizontal axis (axis 1) represents the receivers.
-
-
-
-
-
-
- Source code in cell2cell/core/communication_scores.py
- def aggregate_ccc_matrices(ccc_matrices, method='gmean'):
- '''Aggregates matrices of communication scores. Each
- matrix has the communication scores across all pairs
- of cell-types/tissues/samples for a different
- pair of interacting proteins.
-
- Parameters
- ----------
- ccc_matrices : list
- List of matrices of communication scores. Each matrix
- is for an specific pair of interacting proteins.
-
- method : str, default='gmean'.
- Method to aggregate the matrices element-wise.
- Options are:
-
- - 'gmean' : Geometric mean in an element-wise way.
- - 'sum' : Sum in an element-wise way.
- - 'mean' : Mean in an element-wise way.
-
- Returns
- -------
- aggregated_ccc_matrix : numpy.array
- A matrix contiaining aggregated communication scores
- from multiple PPIs. It's shape is of MxM, where M are all
- cell-types/tissues/samples. In directed interactions, the
- vertical axis (axis 0) represents the senders, while the
- horizontal axis (axis 1) represents the receivers.
- '''
- if method == 'gmean':
- aggregated_ccc_matrix = gmean(ccc_matrices)
- elif method == 'sum':
- aggregated_ccc_matrix = np.nansum(ccc_matrices, axis=0)
- elif method == 'mean':
- aggregated_ccc_matrix = np.nanmean(ccc_matrices, axis=0)
- else:
- raise ValueError("Not a valid method")
-
- return aggregated_ccc_matrix
-
-
-
-
-
-
+
-
-
-
-
+
-
- interaction_space
+
+
+InteractionSpace
@@ -6059,209 +5799,149 @@
+ Interaction space that contains all the required elements to perform the analysis between every pair of cells.
+Parameters
+rnaseq_data : pandas.DataFrame
+ Gene expression data for a bulk RNA-seq experiment or a single-cell
+ experiment after aggregation into cell types. Columns are
+ cell-types/tissues/samples and rows are genes.
+ppi_data : pandas.DataFrame
+ List of protein-protein interactions (or ligand-receptor pairs) used
+ for inferring the cell-cell interactions and communication.
+gene_cutoffs : dict
+ Contains two keys: 'type' and 'parameter'. The first key represent the
+ way to use a cutoff or threshold, while parameter is the value used
+ to binarize the expression values.
+ The key 'type' can be:
+- 'local_percentile' : computes the value of a given percentile, for each
+ gene independently. In this case, the parameter corresponds to the
+ percentile to compute, as a float value between 0 and 1.
+- 'global_percentile' : computes the value of a given percentile from all
+ genes and samples simultaneously. In this case, the parameter
+ corresponds to the percentile to compute, as a float value between
+ 0 and 1. All genes have the same cutoff.
+- 'file' : load a cutoff table from a file. Parameter in this case is the
+ path of that file. It must contain the same genes as index and same
+ samples as columns.
+- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in each sample. This allows to use specific cutoffs for
+ each sample. The columns here must be the same as the ones in the
+ rnaseq_data.
+- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
+ for each gene in only one column. These cutoffs will be applied to
+ all samples.
+- 'constant_value' : binarizes the expression. Evaluates whether
+ expression is greater than the value input in the parameter.
+
+communication_score : str, default='expression_thresholding'
+ Type of communication score used to detect active ligand-receptor
+ pairs between each pair of cell. See
+ cell2cell.core.communication_scores for more details.
+ It can be:
+- 'expression_thresholding'
+- 'expression_product'
+- 'expression_mean'
+- 'expression_gmean'
+
+cci_score : str, default='bray_curtis'
+ Scoring function to aggregate the communication scores. See
+ cell2cell.core.cci_scores for more details.
+ It can be:
+- 'bray_curtis'
+- 'jaccard'
+- 'count'
+- 'icellnet'
+
-
-
-
-
-
-
-Classes
-
-
-
-
-
-
-
-InteractionSpace
+
cci_type : str, default='undirected'
+ Type of interaction between two cells. If it is undirected, all ligands
+ and receptors are considered from both cells. If it is directed, ligands
+ from one cell and receptors from the other are considered separately with
+ respect to ligands from the second cell and receptor from the first one.
+ So, it can be:
+- 'undirected'
+- 'directed'
+
+cci_matrix_template : pandas.DataFrame, default=None
+ A matrix of shape MxM where M are cell-types/tissues/samples. This
+ is used as template for storing CCI scores. It may be useful
+ for specifying which pairs of cells to consider.
+complex_sep : str, default=None
+ Symbol that separates the protein subunits in a multimeric complex.
+ For example, '&' is the complex_sep for a list of ligand-receptor pairs
+ where a protein partner could be "CD74&CD44".
+complex_agg_method : str, default='min'
+ Method to aggregate the expression value of multiple genes in a
+ complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
+interaction_columns : tuple, default=('A', 'B')
+ Contains the names of the columns where to find the partners in a
+ dataframe of protein-protein interactions. If the list is for
+ ligand-receptor pairs, the first column is for the ligands and the second
+ for the receptors.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Attributes
+communication_score : str
+ Type of communication score used to detect active ligand-receptor
+ pairs between each pair of cell. See
+ cell2cell.core.communication_scores for more details.
+ It can be:
+- 'expression_thresholding'
+- 'expression_product'
+- 'expression_mean'
+- 'expression_gmean'
+
-
+cci_score : str
+ Scoring function to aggregate the communication scores. See
+ cell2cell.core.cci_scores for more details.
+ It can be:
+- 'bray_curtis'
+- 'jaccard'
+- 'count'
+- 'icellnet'
+
-
+cci_type : str
+ Type of interaction between two cells. If it is undirected, all ligands
+ and receptors are considered from both cells. If it is directed, ligands
+ from one cell and receptors from the other are considered separately with
+ respect to ligands from the second cell and receptor from the first one.
+ So, it can be:
+- 'undirected'
+- 'directed'
+
- Interaction space that contains all the required elements to perform the analysis between every pair of cells.
+ppi_data : pandas.DataFrame
+ List of protein-protein interactions (or ligand-receptor pairs) used
+ for inferring the cell-cell interactions and communication.
+modified_rnaseq_data : pandas.DataFrame
+ Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
+ Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
+ may correspond to scoring the gene expression as binary or continuous values
+ depending on the scoring function for cell-cell interactions/communication.
+interaction_elements : dict
+ Dictionary containing all the pairs of cells considered (under
+ the key of 'pairs'), Cell instances (under key 'cells')
+ which include all cells/tissues/organs with their associated datasets
+ (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
+ to store CCI scores(under key 'cci_matrix'). A communication matrix
+ is also stored in this object when the communication scores are
+ computed in the InteractionSpace class (under key
+ 'communication_matrix')
+distance_matrix : pandas.DataFrame
+ Contains distances for each pair of cells, computed from
+ the CCI scores previously obtained (and stored in
+ interaction_elements['cci_matrix'].
-
-
-
-
-
-
-
- Parameters:
-
-
- rnaseq_data (pandas.DataFrame
) – Gene expression data for a bulk RNA-seq experiment or a single-cell
-experiment after aggregation into cell types. Columns are
-cell-types/tissues/samples and rows are genes.
- ppi_data (pandas.DataFrame
) – List of protein-protein interactions (or ligand-receptor pairs) used
-for inferring the cell-cell interactions and communication.
- gene_cutoffs (dict
) – Contains two keys: 'type' and 'parameter'. The first key represent the
-way to use a cutoff or threshold, while parameter is the value used
-to binarize the expression values.
-The key 'type' can be:
-
-- 'local_percentile' : computes the value of a given percentile, for each
- gene independently. In this case, the parameter corresponds to the
- percentile to compute, as a float value between 0 and 1.
-- 'global_percentile' : computes the value of a given percentile from all
- genes and samples simultaneously. In this case, the parameter
- corresponds to the percentile to compute, as a float value between
- 0 and 1. All genes have the same cutoff.
-- 'file' : load a cutoff table from a file. Parameter in this case is the
- path of that file. It must contain the same genes as index and same
- samples as columns.
-- 'multi_col_matrix' : a dataframe must be provided, containing a cutoff
- for each gene in each sample. This allows to use specific cutoffs for
- each sample. The columns here must be the same as the ones in the
- rnaseq_data.
-- 'single_col_matrix' : a dataframe must be provided, containing a cutoff
- for each gene in only one column. These cutoffs will be applied to
- all samples.
-- 'constant_value' : binarizes the expression. Evaluates whether
- expression is greater than the value input in the parameter.
-
- communication_score (str, default='expression_thresholding'
) – Type of communication score used to detect active ligand-receptor
-pairs between each pair of cell. See
-cell2cell.core.communication_scores for more details.
-It can be:
-
-- 'expression_thresholding'
-- 'expression_product'
-- 'expression_mean'
-- 'expression_gmean'
-
- cci_score (str, default='bray_curtis'
) – Scoring function to aggregate the communication scores. See
-cell2cell.core.cci_scores for more details.
-It can be:
-
-- 'bray_curtis'
-- 'jaccard'
-- 'count'
-- 'icellnet'
-
- cci_type (str, default='undirected'
) – Type of interaction between two cells. If it is undirected, all ligands
-and receptors are considered from both cells. If it is directed, ligands
-from one cell and receptors from the other are considered separately with
-respect to ligands from the second cell and receptor from the first one.
-So, it can be:
-
-- 'undirected'
-- 'directed'
-
- cci_matrix_template (pandas.DataFrame, default=None
) – A matrix of shape MxM where M are cell-types/tissues/samples. This
-is used as template for storing CCI scores. It may be useful
-for specifying which pairs of cells to consider.
- complex_sep (str, default=None
) – Symbol that separates the protein subunits in a multimeric complex.
-For example, '&' is the complex_sep for a list of ligand-receptor pairs
-where a protein partner could be "CD74&CD44".
- complex_agg_method (str, default='min'
) – Method to aggregate the expression value of multiple genes in a
-complex.
-
-- 'min' : Minimum expression value among all genes.
-- 'mean' : Average expression value among all genes.
-- 'gmean' : Geometric mean expression value among all genes.
-
- interaction_columns (tuple, default=('A', 'B')
) – Contains the names of the columns where to find the partners in a
-dataframe of protein-protein interactions. If the list is for
-ligand-receptor pairs, the first column is for the ligands and the second
-for the receptors.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-Attributes:
-
-
-
- Name
- Type
- Description
-
-
-
-
- communication_score
- str
- Type of communication score used to detect active ligand-receptor
-pairs between each pair of cell. See
-cell2cell.core.communication_scores for more details.
-It can be:
-
-- 'expression_thresholding'
-- 'expression_product'
-- 'expression_mean'
-- 'expression_gmean'
-
-
-
- cci_score
- str
- Scoring function to aggregate the communication scores. See
-cell2cell.core.cci_scores for more details.
-It can be:
-
-- 'bray_curtis'
-- 'jaccard'
-- 'count'
-- 'icellnet'
-
-
-
- cci_type
- str
- Type of interaction between two cells. If it is undirected, all ligands
-and receptors are considered from both cells. If it is directed, ligands
-from one cell and receptors from the other are considered separately with
-respect to ligands from the second cell and receptor from the first one.
-So, it can be:
-
-- 'undirected'
-- 'directed'
-
-
-
- ppi_data
- pandas.DataFrame
- List of protein-protein interactions (or ligand-receptor pairs) used
-for inferring the cell-cell interactions and communication.
-
-
- modified_rnaseq_data
- pandas.DataFrame
- Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
-Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
-may correspond to scoring the gene expression as binary or continuous values
-depending on the scoring function for cell-cell interactions/communication.
-
-
- interaction_elements
- dict
- Dictionary containing all the pairs of cells considered (under
-the key of 'pairs'), Cell instances (under key 'cells')
-which include all cells/tissues/organs with their associated datasets
-(rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
-to store CCI scores(under key 'cci_matrix'). A communication matrix
-is also stored in this object when the communication scores are
-computed in the InteractionSpace class (under key
-'communication_matrix')
-
-
- distance_matrix
- pandas.DataFrame
- Contains distances for each pair of cells, computed from
-the CCI scores previously obtained (and stored in
-interaction_elements['cci_matrix'].
-
-
-
Source code in cell2cell/core/interaction_space.py
class InteractionSpace():
@@ -6816,224 +6496,66 @@ Methods
+
+
-
-pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True)
+
+compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)
-
+
- Computes a CCI score for a pair of cells.
+ Computes overall CCI scores for each pair of cells.
+Parameters
+cci_score : str, default=None
+ Scoring function to aggregate the communication scores between
+ a pair of cells. It computes an overall potential of cell-cell
+ interactions. If None, it will use the one stored in the
+ attribute analysis_setup of this object.
+ Options:
+- 'bray_curtis' : Bray-Curtis-like score
+- 'jaccard' : Jaccard-like score
+- 'count' : Number of LR pairs that the pair of cells uses
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
+
+use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+self.interaction_elements['cci_matrix'] : pandas.DataFrame
+ Contains CCI scores for each pair of cells
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the receiver.
- cci_score (str, default='bray_curtis'
) – Scoring function to aggregate the communication scores between
-a pair of cells. It computes an overall potential of cell-cell
-interactions. If None, it will use the one stored in the
-attribute analysis_setup of this object.
-Options:
-
-- 'bray_curtis' : Bray-Curtis-like score
-- 'jaccard' : Jaccard-like score
-- 'count' : Number of LR pairs that the pair of cells uses
-- 'icellnet' : Sum of the L-R expression product of a pair of cells
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- float
– Overall score for the interaction between a pair of
-cell-types/tissues/samples. In this case it is a
-Jaccard-like score.
-
-
-
-
-
Source code in cell2cell/core/interaction_space.py
- def pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True):
- '''
- Computes a CCI score for a pair of cells.
-
- Parameters
- ----------
- cell1 : cell2cell.core.cell.Cell
- First cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the sender.
-
- cell2 : cell2cell.core.cell.Cell
- Second cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the receiver.
-
- cci_score : str, default='bray_curtis'
- Scoring function to aggregate the communication scores between
- a pair of cells. It computes an overall potential of cell-cell
- interactions. If None, it will use the one stored in the
- attribute analysis_setup of this object.
- Options:
-
- - 'bray_curtis' : Bray-Curtis-like score
- - 'jaccard' : Jaccard-like score
- - 'count' : Number of LR pairs that the pair of cells uses
- - 'icellnet' : Sum of the L-R expression product of a pair of cells
-
- use_ppi_score : boolean, default=False
- Whether using a weight of LR pairs specified in the ppi_data
- to compute the scores.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
-
- Returns
- -------
- cci_score : float
- Overall score for the interaction between a pair of
- cell-types/tissues/samples. In this case it is a
- Jaccard-like score.
- '''
-
- if verbose:
- print("Computing interaction score between {} and {}".format(cell1.type, cell2.type))
-
- if use_ppi_score:
- ppi_score = self.ppi_data['score'].values
- else:
- ppi_score = None
- # Calculate cell-cell interaction score
- if cci_score == 'bray_curtis':
- cci_value = cci_scores.compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=ppi_score)
- elif cci_score == 'jaccard':
- cci_value = cci_scores.compute_jaccard_like_cci_score(cell1, cell2, ppi_score=ppi_score)
- elif cci_score == 'count':
- cci_value = cci_scores.compute_count_score(cell1, cell2, ppi_score=ppi_score)
- elif cci_score == 'icellnet':
- cci_value = cci_scores.compute_icellnet_score(cell1, cell2, ppi_score=ppi_score)
- else:
- raise NotImplementedError("CCI score {} to compute pairwise cell-interactions is not implemented".format(cci_score))
- return cci_value
-
-
-
-
-
-
-
-
-
-
-
-
-
-compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True)
-
-
-
-
-
-
- Computes overall CCI scores for each pair of cells.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- cci_score (str, default=None
) – Scoring function to aggregate the communication scores between
-a pair of cells. It computes an overall potential of cell-cell
-interactions. If None, it will use the one stored in the
-attribute analysis_setup of this object.
-Options:
-
-- 'bray_curtis' : Bray-Curtis-like score
-- 'jaccard' : Jaccard-like score
-- 'count' : Number of LR pairs that the pair of cells uses
-- 'icellnet' : Sum of the L-R expression product of a pair of cells
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Contains CCI scores for each pair of cells
-
-
-
-
-
-
- Source code in cell2cell/core/interaction_space.py
- def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
- '''Computes overall CCI scores for each pair of cells.
-
- Parameters
- ----------
- cci_score : str, default=None
- Scoring function to aggregate the communication scores between
- a pair of cells. It computes an overall potential of cell-cell
- interactions. If None, it will use the one stored in the
- attribute analysis_setup of this object.
- Options:
-
- - 'bray_curtis' : Bray-Curtis-like score
- - 'jaccard' : Jaccard-like score
- - 'count' : Number of LR pairs that the pair of cells uses
- - 'icellnet' : Sum of the L-R expression product of a pair of cells
-
- use_ppi_score : boolean, default=False
- Whether using a weight of LR pairs specified in the ppi_data
- to compute the scores.
+ def compute_pairwise_cci_scores(self, cci_score=None, use_ppi_score=False, verbose=True):
+ '''Computes overall CCI scores for each pair of cells.
+
+ Parameters
+ ----------
+ cci_score : str, default=None
+ Scoring function to aggregate the communication scores between
+ a pair of cells. It computes an overall potential of cell-cell
+ interactions. If None, it will use the one stored in the
+ attribute analysis_setup of this object.
+ Options:
+
+ - 'bray_curtis' : Bray-Curtis-like score
+ - 'jaccard' : Jaccard-like score
+ - 'count' : Number of LR pairs that the pair of cells uses
+ - 'icellnet' : Sum of the L-R expression product of a pair of cells
+
+ use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
verbose : boolean, default=True
Whether printing or not steps of the analysis.
@@ -7095,257 +6617,74 @@
-pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding', use_ppi_score=False, verbose=True)
-
-
-
-
-
-
- Computes a communication score for each protein-protein interaction
-between a pair of cells.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- cell1 (cell2cell.core.cell.Cell
) – First cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the sender.
- cell2 (cell2cell.core.cell.Cell
) – Second cell-type/tissue/sample to compute the communication
-score. In a directed interaction, this is the receiver.
- communication_score (str, default=None
) – Type of communication score to infer the potential use of
-a given ligand-receptor pair by a pair of cells/tissues/samples.
-If None, the score stored in the attribute analysis_setup
-will be used.
-Available communication_scores are:
-
-- 'expression_thresholding' : Computes the joint presence of a
- ligand from a sender cell and of
- a receptor on a receiver cell from
- binarizing their gene expression levels.
-- 'expression_mean' : Computes the average between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_product' : Computes the product between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_gmean' : Computes the geometric mean between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- numpy.array
– An array with the communication scores for each intercellular
-PPI.
-
-
-
-
-
-
- Source code in cell2cell/core/interaction_space.py
- def pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding',
- use_ppi_score=False, verbose=True):
- '''Computes a communication score for each protein-protein interaction
- between a pair of cells.
-
- Parameters
- ----------
- cell1 : cell2cell.core.cell.Cell
- First cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the sender.
-
- cell2 : cell2cell.core.cell.Cell
- Second cell-type/tissue/sample to compute the communication
- score. In a directed interaction, this is the receiver.
-
- communication_score : str, default=None
- Type of communication score to infer the potential use of
- a given ligand-receptor pair by a pair of cells/tissues/samples.
- If None, the score stored in the attribute analysis_setup
- will be used.
- Available communication_scores are:
-
- - 'expression_thresholding' : Computes the joint presence of a
- ligand from a sender cell and of
- a receptor on a receiver cell from
- binarizing their gene expression levels.
- - 'expression_mean' : Computes the average between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
- - 'expression_product' : Computes the product between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
- - 'expression_gmean' : Computes the geometric mean between the expression
- of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score : boolean, default=False
- Whether using a weight of LR pairs specified in the ppi_data
- to compute the scores.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
-
- Returns
- -------
- communication_scores : numpy.array
- An array with the communication scores for each intercellular
- PPI.
- '''
- # TODO: Implement communication scores
- if verbose:
- print("Computing communication score between {} and {}".format(cell1.type, cell2.type))
-
- # Check that new score is the same type as score used to build interaction space (binary or continuous)
- if (communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']) \
- & (self.communication_score in ['expression_thresholding', 'differential_combinations']):
- raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
- if (communication_score in ['expression_thresholding', 'differential_combinations']) \
- & (self.communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']):
- raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
-
- if use_ppi_score:
- ppi_score = self.ppi_data['score'].values
- else:
- ppi_score = None
-
- if communication_score in ['expression_thresholding', 'differential_combinations']:
- communication_value = communication_scores.get_binary_scores(cell1=cell1,
- cell2=cell2,
- ppi_score=ppi_score)
- elif communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']:
- communication_value = communication_scores.get_continuous_scores(cell1=cell1,
- cell2=cell2,
- ppi_score=ppi_score,
- method=communication_score)
- else:
- raise NotImplementedError(
- "Communication score {} to compute pairwise cell-communication is not implemented".format(communication_score))
- return communication_value
-
-
-
-
-
-
-
-
-
-
-
-
-
+
compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None, interaction_columns=('A', 'B'), cells=None, cci_type=None, verbose=True)
-
+
- Computes the communication scores for each LR pairs in
-a given pair of sender-receiver cell
-
-
-
-
-
-
-
-
- Parameters:
-
-
- communication_score (str, default=None
) – Type of communication score to infer the potential use of
-a given ligand-receptor pair by a pair of cells/tissues/samples.
-If None, the score stored in the attribute analysis_setup
-will be used.
-Available communication_scores are:
-
-- 'expression_thresholding' : Computes the joint presence of a
+
Computes the communication scores for each LR pairs in
+a given pair of sender-receiver cell
+Parameters
+communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+- 'expression_thresholding' : Computes the joint presence of a
ligand from a sender cell and of
a receptor on a receiver cell from
- binarizing their gene expression levels.
-- 'expression_mean' : Computes the average between the expression
+ binarizing their gene expression levels.
+- 'expression_mean' : Computes the average between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_product' : Computes the product between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_product' : Computes the product between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-- 'expression_gmean' : Computes the geometric mean between the expression
+ expression of a receptor on a receiver cell.
+- 'expression_gmean' : Computes the geometric mean between the expression
of a ligand from a sender cell and the
- expression of a receptor on a receiver cell.
-
- use_ppi_score (boolean, default=False
) – Whether using a weight of LR pairs specified in the ppi_data
-to compute the scores.
- ref_ppi_data (pandas.DataFrame, default=None
) – Reference list of protein-protein interactions (or
-ligand-receptor pairs) used for inferring the cell-cell
-interactions and communication. It could be the same as
-'ppi_data' if ppi_data is not bidirectional (that is,
-contains ProtA-ProtB interaction as well as ProtB-ProtA
-interaction). ref_ppi must be undirected (contains only
-ProtA-ProtB and not ProtB-ProtA interaction). If None
-the one stored in the attribute ref_ppi will be used.
- interaction_columns (tuple, default=None
) – Contains the names of the columns where to find the
-partners in a dataframe of protein-protein interactions.
-If the list is for ligand-receptor pairs, the first column
-is for the ligands and the second for the receptors. If
-None, the one stored in the attribute interaction_columns
-will be used
- cells (list=None
) – List of cells to consider.
- cci_type (str, default=None
) – Type of interaction between two cells. Used to specify
-if we want to consider a LR pair in both directions.
-It can be:
- - 'undirected'
- - 'directed
-If None, the one stored in the attribute analysis_setup
-will be used.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Contains communication scores for each LR pair in a
-given pair of sender-receiver cells.
-
-
-
-
-
+ expression of a receptor on a receiver cell.
+
+
+use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+ref_ppi_data : pandas.DataFrame, default=None
+ Reference list of protein-protein interactions (or
+ ligand-receptor pairs) used for inferring the cell-cell
+ interactions and communication. It could be the same as
+ 'ppi_data' if ppi_data is not bidirectional (that is,
+ contains ProtA-ProtB interaction as well as ProtB-ProtA
+ interaction). ref_ppi must be undirected (contains only
+ ProtA-ProtB and not ProtB-ProtA interaction). If None
+ the one stored in the attribute ref_ppi will be used.
+interaction_columns : tuple, default=None
+ Contains the names of the columns where to find the
+ partners in a dataframe of protein-protein interactions.
+ If the list is for ligand-receptor pairs, the first column
+ is for the ligands and the second for the receptors. If
+ None, the one stored in the attribute interaction_columns
+ will be used
+cells : list=None
+ List of cells to consider.
+cci_type : str, default=None
+ Type of interaction between two cells. Used to specify
+ if we want to consider a LR pair in both directions.
+ It can be:
+ - 'undirected'
+ - 'directed
+ If None, the one stored in the attribute analysis_setup
+ will be used.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+self.interaction_elements['communication_matrix'] : pandas.DataFrame
+ Contains communication scores for each LR pair in a
+ given pair of sender-receiver cells.
+
Source code in cell2cell/core/interaction_space.py
def compute_pairwise_communication_scores(self, communication_score=None, use_ppi_score=False, ref_ppi_data=None,
@@ -7482,135 +6821,252 @@ Functions
-
-
+
-
-generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True)
+
+pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True)
-
+
- Generates a list of pairs of interacting cell-types/tissues/samples.
+ Computes a CCI score for a pair of cells.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+cci_score : str, default='bray_curtis'
+ Scoring function to aggregate the communication scores between
+ a pair of cells. It computes an overall potential of cell-cell
+ interactions. If None, it will use the one stored in the
+ attribute analysis_setup of this object.
+ Options:
+- 'bray_curtis' : Bray-Curtis-like score
+- 'jaccard' : Jaccard-like score
+- 'count' : Number of LR pairs that the pair of cells uses
+- 'icellnet' : Sum of the L-R expression product of a pair of cells
+
+
+use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples. In this case it is a
+ Jaccard-like score.
-
-
-
-
-
-
-
- Parameters:
-
-
- cells (list
) – A lyst of cell-type/tissue/sample names.
- cci_type (str,
) – Type of interactions.
-Options are:
-
-- 'directed' : Directed cell-cell interactions, so pair A-B is different
- to pair B-A and both are considered.
-- 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
- to pair B-A and just one of them is considered.
-
- self_interaction (boolean, default=True
) – Whether considering autocrine interactions (pair A-A, B-B, etc).
- remove_duplicates (boolean, default=True
) – Whether removing duplicates when a list of cells is passed and names are
-duplicated. If False and a list [A, A, B] is passed, pairs could be
-[A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True
-and cci_type is 'directed'. In the same scenario but when remove_duplicates
-is True, the resulting list would be [A-A, A-B, B-A, B-B].
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- list
– List with pairs of interacting cell-types/tissues/samples.
-
-
-
-
-
Source code in cell2cell/core/interaction_space.py
- def generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True):
- '''Generates a list of pairs of interacting cell-types/tissues/samples.
-
- Parameters
- ----------
- cells : list
- A lyst of cell-type/tissue/sample names.
-
- cci_type : str,
- Type of interactions.
- Options are:
-
- - 'directed' : Directed cell-cell interactions, so pair A-B is different
- to pair B-A and both are considered.
- - 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
- to pair B-A and just one of them is considered.
-
- self_interaction : boolean, default=True
- Whether considering autocrine interactions (pair A-A, B-B, etc).
-
- remove_duplicates : boolean, default=True
- Whether removing duplicates when a list of cells is passed and names are
- duplicated. If False and a list [A, A, B] is passed, pairs could be
- [A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True
- and cci_type is 'directed'. In the same scenario but when remove_duplicates
- is True, the resulting list would be [A-A, A-B, B-A, B-B].
-
- Returns
- -------
- pairs : list
- List with pairs of interacting cell-types/tissues/samples.
- '''
- if self_interaction:
- if cci_type == 'directed':
- pairs = list(itertools.product(cells, cells))
- #pairs = list(itertools.combinations(cells + cells, 2)) # Directed
- elif cci_type == 'undirected':
- pairs = list(itertools.combinations(cells, 2)) + [(c, c) for c in cells] # Undirected
- else:
- raise NotImplementedError("CCI type has to be directed or undirected")
- else:
- if cci_type == 'directed':
- pairs_ = list(itertools.product(cells, cells))
- pairs = []
- for p in pairs_:
- if p[0] == p[1]:
- continue
- else:
- pairs.append(p)
- elif cci_type == 'undirected':
- pairs = list(itertools.combinations(cells, 2))
- else:
- raise NotImplementedError("CCI type has to be directed or undirected")
- if remove_duplicates:
- pairs = list(set(pairs)) # Remove duplicates
- return pairs
+ def pair_cci_score(self, cell1, cell2, cci_score='bray_curtis', use_ppi_score=False, verbose=True):
+ '''
+ Computes a CCI score for a pair of cells.
+
+ Parameters
+ ----------
+ cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+
+ cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+
+ cci_score : str, default='bray_curtis'
+ Scoring function to aggregate the communication scores between
+ a pair of cells. It computes an overall potential of cell-cell
+ interactions. If None, it will use the one stored in the
+ attribute analysis_setup of this object.
+ Options:
+
+ - 'bray_curtis' : Bray-Curtis-like score
+ - 'jaccard' : Jaccard-like score
+ - 'count' : Number of LR pairs that the pair of cells uses
+ - 'icellnet' : Sum of the L-R expression product of a pair of cells
+
+ use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+
+ Returns
+ -------
+ cci_score : float
+ Overall score for the interaction between a pair of
+ cell-types/tissues/samples. In this case it is a
+ Jaccard-like score.
+ '''
+
+ if verbose:
+ print("Computing interaction score between {} and {}".format(cell1.type, cell2.type))
+
+ if use_ppi_score:
+ ppi_score = self.ppi_data['score'].values
+ else:
+ ppi_score = None
+ # Calculate cell-cell interaction score
+ if cci_score == 'bray_curtis':
+ cci_value = cci_scores.compute_braycurtis_like_cci_score(cell1, cell2, ppi_score=ppi_score)
+ elif cci_score == 'jaccard':
+ cci_value = cci_scores.compute_jaccard_like_cci_score(cell1, cell2, ppi_score=ppi_score)
+ elif cci_score == 'count':
+ cci_value = cci_scores.compute_count_score(cell1, cell2, ppi_score=ppi_score)
+ elif cci_score == 'icellnet':
+ cci_value = cci_scores.compute_icellnet_score(cell1, cell2, ppi_score=ppi_score)
+ else:
+ raise NotImplementedError("CCI score {} to compute pairwise cell-interactions is not implemented".format(cci_score))
+ return cci_value
+
+
+
+
+
+
+
+
+
+
+
+
+
+pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding', use_ppi_score=False, verbose=True)
+
+
+
+
+
+
+ Computes a communication score for each protein-protein interaction
+between a pair of cells.
+Parameters
+cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+- 'expression_thresholding' : Computes the joint presence of a
+ ligand from a sender cell and of
+ a receptor on a receiver cell from
+ binarizing their gene expression levels.
+- 'expression_mean' : Computes the average between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+- 'expression_product' : Computes the product between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+- 'expression_gmean' : Computes the geometric mean between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+
+
+use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+communication_scores : numpy.array
+ An array with the communication scores for each intercellular
+ PPI.
+
+
+ Source code in cell2cell/core/interaction_space.py
+ def pair_communication_score(self, cell1, cell2, communication_score='expression_thresholding',
+ use_ppi_score=False, verbose=True):
+ '''Computes a communication score for each protein-protein interaction
+ between a pair of cells.
+
+ Parameters
+ ----------
+ cell1 : cell2cell.core.cell.Cell
+ First cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the sender.
+
+ cell2 : cell2cell.core.cell.Cell
+ Second cell-type/tissue/sample to compute the communication
+ score. In a directed interaction, this is the receiver.
+
+ communication_score : str, default=None
+ Type of communication score to infer the potential use of
+ a given ligand-receptor pair by a pair of cells/tissues/samples.
+ If None, the score stored in the attribute analysis_setup
+ will be used.
+ Available communication_scores are:
+
+ - 'expression_thresholding' : Computes the joint presence of a
+ ligand from a sender cell and of
+ a receptor on a receiver cell from
+ binarizing their gene expression levels.
+ - 'expression_mean' : Computes the average between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+ - 'expression_product' : Computes the product between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+ - 'expression_gmean' : Computes the geometric mean between the expression
+ of a ligand from a sender cell and the
+ expression of a receptor on a receiver cell.
+
+ use_ppi_score : boolean, default=False
+ Whether using a weight of LR pairs specified in the ppi_data
+ to compute the scores.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+
+ Returns
+ -------
+ communication_scores : numpy.array
+ An array with the communication scores for each intercellular
+ PPI.
+ '''
+ # TODO: Implement communication scores
+ if verbose:
+ print("Computing communication score between {} and {}".format(cell1.type, cell2.type))
+
+ # Check that new score is the same type as score used to build interaction space (binary or continuous)
+ if (communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']) \
+ & (self.communication_score in ['expression_thresholding', 'differential_combinations']):
+ raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
+ if (communication_score in ['expression_thresholding', 'differential_combinations']) \
+ & (self.communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']):
+ raise ValueError('Cannot use {} for this interaction space'.format(communication_score))
+
+ if use_ppi_score:
+ ppi_score = self.ppi_data['score'].values
+ else:
+ ppi_score = None
+
+ if communication_score in ['expression_thresholding', 'differential_combinations']:
+ communication_value = communication_scores.get_binary_scores(cell1=cell1,
+ cell2=cell2,
+ ppi_score=ppi_score)
+ elif communication_score in ['expression_product', 'expression_correlation', 'expression_mean', 'expression_gmean']:
+ communication_value = communication_scores.get_continuous_scores(cell1=cell1,
+ cell2=cell2,
+ ppi_score=ppi_score,
+ method=communication_score)
+ else:
+ raise NotImplementedError(
+ "Communication score {} to compute pairwise cell-communication is not implemented".format(communication_score))
+ return communication_value
@@ -7619,89 +7075,81 @@
+
+
+
+
+
+
+
+
-
+
generate_interaction_elements(modified_rnaseq, ppi_data, cci_type='undirected', cci_matrix_template=None, complex_sep=None, complex_agg_method='min', interaction_columns=('A', 'B'), verbose=True)
-
+
- Create all elements needed to perform the analyses of pairwise
-cell-cell interactions/communication. Corresponds to the interaction
+
Create all elements needed to perform the analyses of pairwise
+cell-cell interactions/communication. Corresponds to the interaction
elements used by the class InteractionSpace.
+Parameters
+modified_rnaseq : pandas.DataFrame
+ Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
+ Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
+ may correspond to scoring the gene expression as binary or continuous values
+ depending on the scoring function for cell-cell interactions/communication.
+ppi_data : pandas.DataFrame
+ List of protein-protein interactions (or ligand-receptor pairs) used for
+ inferring the cell-cell interactions and communication.
+cci_type : str, default='undirected'
+ Specifies whether computing the cci_score in a directed or undirected
+ way. For a pair of cells A and B, directed means that the ligands are
+ considered only from cell A and receptors only from cell B or viceversa.
+ While undirected simultaneously considers signaling from cell A to
+ cell B and from cell B to cell A.
+cci_matrix_template : pandas.DataFrame, default=None
+ A matrix of shape MxM where M are cell-types/tissues/samples. This
+ is used as template for storing CCI scores. It may be useful
+ for specifying which pairs of cells to consider.
+complex_sep : str, default=None
+ Symbol that separates the protein subunits in a multimeric complex.
+ For example, '&' is the complex_sep for a list of ligand-receptor pairs
+ where a protein partner could be "CD74&CD44".
+complex_agg_method : str, default='min'
+ Method to aggregate the expression value of multiple genes in a
+ complex.
+- 'min' : Minimum expression value among all genes.
+- 'mean' : Average expression value among all genes.
+- 'gmean' : Geometric mean expression value among all genes.
+
+
+interaction_columns : tuple, default=('A', 'B')
+ Contains the names of the columns where to find the partners in a
+ dataframe of protein-protein interactions. If the list is for
+ ligand-receptor pairs, the first column is for the ligands and the second
+ for the receptors.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+interaction_elements : dict
+ Dictionary containing all the pairs of cells considered (under
+ the key of 'pairs'), Cell instances (under key 'cells')
+ which include all cells/tissues/organs with their associated datasets
+ (rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
+ to store CCI scores(under key 'cci_matrix'). A communication matrix
+ is also stored in this object when the communication scores are
+ computed in the InteractionSpace class (under key
+ 'communication_score')
-
-
-
-
-
-
-
- Parameters:
-
-
- modified_rnaseq (pandas.DataFrame
) – Preprocessed gene expression data for a bulk or single-cell RNA-seq experiment.
-Columns are are cell-types/tissues/samples and rows are genes. The preprocessing
-may correspond to scoring the gene expression as binary or continuous values
-depending on the scoring function for cell-cell interactions/communication.
- ppi_data (pandas.DataFrame
) – List of protein-protein interactions (or ligand-receptor pairs) used for
-inferring the cell-cell interactions and communication.
- cci_type (str, default='undirected'
) – Specifies whether computing the cci_score in a directed or undirected
-way. For a pair of cells A and B, directed means that the ligands are
-considered only from cell A and receptors only from cell B or viceversa.
-While undirected simultaneously considers signaling from cell A to
-cell B and from cell B to cell A.
- cci_matrix_template (pandas.DataFrame, default=None
) – A matrix of shape MxM where M are cell-types/tissues/samples. This
-is used as template for storing CCI scores. It may be useful
-for specifying which pairs of cells to consider.
- complex_sep (str, default=None
) – Symbol that separates the protein subunits in a multimeric complex.
-For example, '&' is the complex_sep for a list of ligand-receptor pairs
-where a protein partner could be "CD74&CD44".
- complex_agg_method (str, default='min'
) – Method to aggregate the expression value of multiple genes in a
-complex.
-
-- 'min' : Minimum expression value among all genes.
-- 'mean' : Average expression value among all genes.
-- 'gmean' : Geometric mean expression value among all genes.
-
- interaction_columns (tuple, default=('A', 'B')
) – Contains the names of the columns where to find the partners in a
-dataframe of protein-protein interactions. If the list is for
-ligand-receptor pairs, the first column is for the ligands and the second
-for the receptors.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– Dictionary containing all the pairs of cells considered (under
-the key of 'pairs'), Cell instances (under key 'cells')
-which include all cells/tissues/organs with their associated datasets
-(rna_seq, weighted_ppi, etc) and a Cell-Cell Interaction Matrix
-to store CCI scores(under key 'cci_matrix'). A communication matrix
-is also stored in this object when the communication scores are
-computed in the InteractionSpace class (under key
-'communication_score')
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- cell2cell.datasets
-
-
-
- special
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True)
+
+
-Modules
+ Generates a list of pairs of interacting cell-types/tissues/samples.
+Parameters
+cells : list
+ A lyst of cell-type/tissue/sample names.
+cci_type : str,
+ Type of interactions.
+ Options are:
+- 'directed' : Directed cell-cell interactions, so pair A-B is different
+ to pair B-A and both are considered.
+- 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
+ to pair B-A and just one of them is considered.
+
-
+self_interaction : boolean, default=True
+ Whether considering autocrine interactions (pair A-A, B-B, etc).
+remove_duplicates : boolean, default=True
+ Whether removing duplicates when a list of cells is passed and names are
+ duplicated. If False and a list [A, A, B] is passed, pairs could be
+ [A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True
+ and cci_type is 'directed'. In the same scenario but when remove_duplicates
+ is True, the resulting list would be [A-A, A-B, B-A, B-B].
+Returns
+pairs : list
+ List with pairs of interacting cell-types/tissues/samples.
+
+
+ Source code in cell2cell/core/interaction_space.py
+ def generate_pairs(cells, cci_type, self_interaction=True, remove_duplicates=True):
+ '''Generates a list of pairs of interacting cell-types/tissues/samples.
+
+ Parameters
+ ----------
+ cells : list
+ A lyst of cell-type/tissue/sample names.
+
+ cci_type : str,
+ Type of interactions.
+ Options are:
+
+ - 'directed' : Directed cell-cell interactions, so pair A-B is different
+ to pair B-A and both are considered.
+ - 'undirected' : Undirected cell-cell interactions, so pair A-B is equal
+ to pair B-A and just one of them is considered.
+
+ self_interaction : boolean, default=True
+ Whether considering autocrine interactions (pair A-A, B-B, etc).
+
+ remove_duplicates : boolean, default=True
+ Whether removing duplicates when a list of cells is passed and names are
+ duplicated. If False and a list [A, A, B] is passed, pairs could be
+ [A-A, A-A, A-B, A-A, A-A, A-B, B-A, B-A, B-B] when self_interaction is True
+ and cci_type is 'directed'. In the same scenario but when remove_duplicates
+ is True, the resulting list would be [A-A, A-B, B-A, B-B].
+
+ Returns
+ -------
+ pairs : list
+ List with pairs of interacting cell-types/tissues/samples.
+ '''
+ if self_interaction:
+ if cci_type == 'directed':
+ pairs = list(itertools.product(cells, cells))
+ #pairs = list(itertools.combinations(cells + cells, 2)) # Directed
+ elif cci_type == 'undirected':
+ pairs = list(itertools.combinations(cells, 2)) + [(c, c) for c in cells] # Undirected
+ else:
+ raise NotImplementedError("CCI type has to be directed or undirected")
+ else:
+ if cci_type == 'directed':
+ pairs_ = list(itertools.product(cells, cells))
+ pairs = []
+ for p in pairs_:
+ if p[0] == p[1]:
+ continue
+ else:
+ pairs.append(p)
+ elif cci_type == 'undirected':
+ pairs = list(itertools.combinations(cells, 2))
+ else:
+ raise NotImplementedError("CCI type has to be directed or undirected")
+ if remove_duplicates:
+ pairs = list(set(pairs)) # Remove duplicates
+ return pairs
+
+
+
+
+
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ datasets
+
+
+
+ special
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
anndata
-
+
@@ -7901,22 +7452,22 @@
-Functions
+
-
+
balf_covid(filename='BALF-COVID19-Liao_et_al-NatMed-2020.h5ad')
-
+
- BALF samples from COVID-19 patients
-The data consists in 63k immune and epithelial cells in lungs
+
BALF samples from COVID-19 patients
+The data consists in 63k immune and epithelial cells in lungs
from 3 control, 3 moderate COVID-19, and 6 severe COVID-19 patients.
This dataset was previously published in [1], and this objects contains
the raw counts for the annotated cell types available in:
@@ -7987,12 +7538,12 @@
Returns
-
+
gsea_data
-
+
@@ -8006,60 +7557,38 @@
-Functions
+
-
+
gsea_msig(organism='human', pathwaydb='GOBP', readable_name=False)
-
+
Load a MSigDB from a gmt file
+Parameters
+organism : str, default='human'
+ Organism for whom the DB will be loaded.
+ Available options are {'human', 'mouse'}.
+
+str, default='GOBP'
+Molecular Signature Database to load.
+Available options are {'GOBP', 'KEGG', 'Reactome'}
+
+readable_name : boolean, default=False
+ If True, the pathway names are transformed to a more readable format.
+ That is, removing underscores and pathway DB name at the beginning.
+Returns
+pathway_per_gene : defaultdict
+ Dictionary containing all genes in the DB as keys, and
+ their values are lists with their pathway annotations.
-
-
-
-
-
-
-
- Parameters:
-
-
- organism (str, default='human'
) – Organism for whom the DB will be loaded.
-Available options are {'human', 'mouse'}.
- pathwaydb (str, default='GOBP'
) – Molecular Signature Database to load.
-Available options are {'GOBP', 'KEGG', 'Reactome'}
- readable_name (boolean, default=False
) – If True, the pathway names are transformed to a more readable format.
-That is, removing underscores and pathway DB name at the beginning.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- defaultdict
– Dictionary containing all genes in the DB as keys, and
-their values are lists with their pathway annotations.
-
-
-
-
-
Source code in cell2cell/datasets/gsea_data.py
def gsea_msig(organism='human', pathwaydb='GOBP', readable_name=False):
@@ -8112,12 +7641,12 @@
-
+
heuristic_data
-
+
@@ -8130,50 +7659,33 @@
-Classes
+
-
+
HeuristicGOTerms
-
+
GO terms for contact and secreted proteins.
+Attributes
+contact_go_terms : list
+ List of GO terms associated with proteins that
+ participate in contact interactions (usually
+ on the surface of cells).
+mediator_go_terms : list
+ List of GO terms associated with secreted
+ proteins that mediate intercellular interactions
+ or communication.
-Attributes:
-
-
-
- Name
- Type
- Description
-
-
-
-
- contact_go_terms
- list
- List of GO terms associated with proteins that
-participate in contact interactions (usually
-on the surface of cells).
-
-
- mediator_go_terms
- list
- List of GO terms associated with secreted
-proteins that mediate intercellular interactions
-or communication.
-
-
-
@@ -8240,12 +7768,12 @@
+
random_data
-
+
@@ -8259,102 +7787,77 @@
-Functions
+
-
-generate_random_rnaseq(size, row_names, random_state=None, verbose=True)
+
+generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None)
-
+
- Generates a RNA-seq dataset that is normally distributed gene-wise and size
-normalized (each column sums up to a million).
+ Generates a square cell-cell interaction
+matrix with random scores.
+Parameters
+cell_number : int
+ Number of cells.
+labels : list, default=None
+ List containing labels for each cells. Length of
+ this list must match the cell_number.
+symmetric : boolean, default=True
+ Whether generating a symmetric CCI matrix.
+random_state : int, default=None
+ Seed for randomization.
+Returns
+cci_matrix : pandas.DataFrame
+ Matrix with rows and columns as cells. Values
+ represent a random CCI score between 0 and 1.
-
-
-
-
-
-
-
- Parameters:
-
-
- size (int
) – Number of cell-types/tissues/samples (columns).
- row_names (array-like
) – List containing the name of genes (rows).
- random_state (int, default=None
) – Seed for randomization.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing gene expression given the list
-of genes for each cell-type/tissue/sample.
-
-
-
-
-
Source code in cell2cell/datasets/random_data.py
- def generate_random_rnaseq(size, row_names, random_state=None, verbose=True):
- '''
- Generates a RNA-seq dataset that is normally distributed gene-wise and size
- normalized (each column sums up to a million).
-
- Parameters
- ----------
- size : int
- Number of cell-types/tissues/samples (columns).
-
- row_names : array-like
- List containing the name of genes (rows).
+ def generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None):
+ '''Generates a square cell-cell interaction
+ matrix with random scores.
+
+ Parameters
+ ----------
+ cell_number : int
+ Number of cells.
+
+ labels : list, default=None
+ List containing labels for each cells. Length of
+ this list must match the cell_number.
- random_state : int, default=None
- Seed for randomization.
+ symmetric : boolean, default=True
+ Whether generating a symmetric CCI matrix.
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
+ random_state : int, default=None
+ Seed for randomization.
Returns
-------
- df : pandas.DataFrame
- Dataframe containing gene expression given the list
- of genes for each cell-type/tissue/sample.
+ cci_matrix : pandas.DataFrame
+ Matrix with rows and columns as cells. Values
+ represent a random CCI score between 0 and 1.
'''
- if verbose:
- print('Generating random RNA-seq dataset.')
- columns = ['Cell-{}'.format(c) for c in range(1, size+1)]
-
- if random_state is not None:
- np.random.seed(random_state)
- data = np.random.randn(len(row_names), len(columns)) # Normal distribution
- min = np.abs(np.amin(data, axis=1))
- min = min.reshape((len(min), 1))
-
- data = data + min
- df = pd.DataFrame(data, index=row_names, columns=columns)
- if verbose:
- print('Normalizing random RNA-seq dataset (into TPM)')
- df = rnaseq.scale_expression_by_sum(df, axis=0, sum_value=1e6)
- return df
+ if labels is not None:
+ assert len(labels) == cell_number, "Lenght of labels must match cell_number"
+ else:
+ labels = ['Cell-{}'.format(n) for n in range(1, cell_number+1)]
+
+ if random_state is not None:
+ np.random.seed(random_state)
+ cci_scores = np.random.random((cell_number, cell_number))
+ if symmetric:
+ cci_scores = (cci_scores + cci_scores.T) / 2.
+ cci_matrix = pd.DataFrame(cci_scores, index=labels, columns=labels)
+
+ return cci_matrix
@@ -8367,94 +7870,130 @@
-generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True)
+
+generate_random_metadata(cell_labels, group_number)
-
+
- Generates a random list of protein-protein interactions.
+ Randomly assigns groups to cell labels.
+Parameters
+cell_labels : list
+ A list of cell labels.
+group_number : int
+ Number of major groups of cells.
+Returns
+metadata : pandas.DataFrame
+ DataFrame containing the major groups that each cell
+ received randomly (under column 'Group'). Cells are
+ under the column 'Cell'.
-
-
-
-
-
-
-
- Parameters:
-
-
- max_size (int
) – Maximum size of interactions to obtain. Since the PPIs
-are obtained by independently resampling interactors A and B
-rather than creating all possible combinations (it may demand too much
-memory), some PPIs can be duplicated and when dropping them
-results into a smaller number of PPIs than the max_size.
- interactors_A (list
) – A list of protein names to include in the first column of
-the PPIs.
- interactors_B (list, default=None
) – A list of protein names to include in the second columns
-of the PPIs. If None, interactors_A will be used as
-interactors_B too.
- random_state (int, default=None
) – Seed for randomization.
- verbose (boolean, default=True
) – Whether printing or not steps of the analysis.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– DataFrame containing a list of protein-protein interactions.
-It has three columns: 'A', 'B', and 'score' for interactors
-A, B and weights of interactions, respectively.
-
-
-
-
-
Source code in cell2cell/datasets/random_data.py
- def generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True):
- '''Generates a random list of protein-protein interactions.
+ def generate_random_metadata(cell_labels, group_number):
+ '''Randomly assigns groups to cell labels.
Parameters
----------
- max_size : int
- Maximum size of interactions to obtain. Since the PPIs
- are obtained by independently resampling interactors A and B
- rather than creating all possible combinations (it may demand too much
- memory), some PPIs can be duplicated and when dropping them
- results into a smaller number of PPIs than the max_size.
-
- interactors_A : list
- A list of protein names to include in the first column of
- the PPIs.
-
- interactors_B : list, default=None
- A list of protein names to include in the second columns
- of the PPIs. If None, interactors_A will be used as
- interactors_B too.
-
- random_state : int, default=None
- Seed for randomization.
-
- verbose : boolean, default=True
- Whether printing or not steps of the analysis.
-
- Returns
- -------
- ppi_data : pandas.DataFrame
- DataFrame containing a list of protein-protein interactions.
+ cell_labels : list
+ A list of cell labels.
+
+ group_number : int
+ Number of major groups of cells.
+
+ Returns
+ -------
+ metadata : pandas.DataFrame
+ DataFrame containing the major groups that each cell
+ received randomly (under column 'Group'). Cells are
+ under the column 'Cell'.
+ '''
+ metadata = pd.DataFrame()
+ metadata['Cell'] = cell_labels
+
+ groups = list(range(1, group_number+1))
+ metadata['Group'] = metadata['Cell'].apply(lambda x: np.random.choice(groups, 1)[0])
+ return metadata
+
+
+
+
+
+
+
+
+
+
+
+
+
+generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True)
+
+
+
+
+
+
+ Generates a random list of protein-protein interactions.
+Parameters
+max_size : int
+ Maximum size of interactions to obtain. Since the PPIs
+ are obtained by independently resampling interactors A and B
+ rather than creating all possible combinations (it may demand too much
+ memory), some PPIs can be duplicated and when dropping them
+ results into a smaller number of PPIs than the max_size.
+interactors_A : list
+ A list of protein names to include in the first column of
+ the PPIs.
+interactors_B : list, default=None
+ A list of protein names to include in the second columns
+ of the PPIs. If None, interactors_A will be used as
+ interactors_B too.
+random_state : int, default=None
+ Seed for randomization.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+ppi_data : pandas.DataFrame
+ DataFrame containing a list of protein-protein interactions.
+ It has three columns: 'A', 'B', and 'score' for interactors
+ A, B and weights of interactions, respectively.
+
+
+ Source code in cell2cell/datasets/random_data.py
+ def generate_random_ppi(max_size, interactors_A, interactors_B=None, random_state=None, verbose=True):
+ '''Generates a random list of protein-protein interactions.
+
+ Parameters
+ ----------
+ max_size : int
+ Maximum size of interactions to obtain. Since the PPIs
+ are obtained by independently resampling interactors A and B
+ rather than creating all possible combinations (it may demand too much
+ memory), some PPIs can be duplicated and when dropping them
+ results into a smaller number of PPIs than the max_size.
+
+ interactors_A : list
+ A list of protein names to include in the first column of
+ the PPIs.
+
+ interactors_B : list, default=None
+ A list of protein names to include in the second columns
+ of the PPIs. If None, interactors_A will be used as
+ interactors_B too.
+
+ random_state : int, default=None
+ Seed for randomization.
+
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+
+ Returns
+ -------
+ ppi_data : pandas.DataFrame
+ DataFrame containing a list of protein-protein interactions.
It has three columns: 'A', 'B', and 'score' for interactors
A, B and weights of interactions, respectively.
'''
@@ -8517,177 +8056,73 @@
-generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None)
+
+generate_random_rnaseq(size, row_names, random_state=None, verbose=True)
-
+
- Generates a square cell-cell interaction
-matrix with random scores.
+ Generates a RNA-seq dataset that is normally distributed gene-wise and size
+normalized (each column sums up to a million).
+Parameters
+size : int
+ Number of cell-types/tissues/samples (columns).
+row_names : array-like
+ List containing the name of genes (rows).
+random_state : int, default=None
+ Seed for randomization.
+verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
+Returns
+df : pandas.DataFrame
+ Dataframe containing gene expression given the list
+ of genes for each cell-type/tissue/sample.
-
-
-
-
-
-
-
- Parameters:
-
-
- cell_number (int
) – Number of cells.
- labels (list, default=None
) – List containing labels for each cells. Length of
-this list must match the cell_number.
- symmetric (boolean, default=True
) – Whether generating a symmetric CCI matrix.
- random_state (int, default=None
) – Seed for randomization.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Matrix with rows and columns as cells. Values
-represent a random CCI score between 0 and 1.
-
-
-
-
-
Source code in cell2cell/datasets/random_data.py
- def generate_random_cci_scores(cell_number, labels=None, symmetric=True, random_state=None):
- '''Generates a square cell-cell interaction
- matrix with random scores.
-
- Parameters
- ----------
- cell_number : int
- Number of cells.
-
- labels : list, default=None
- List containing labels for each cells. Length of
- this list must match the cell_number.
+ def generate_random_rnaseq(size, row_names, random_state=None, verbose=True):
+ '''
+ Generates a RNA-seq dataset that is normally distributed gene-wise and size
+ normalized (each column sums up to a million).
+
+ Parameters
+ ----------
+ size : int
+ Number of cell-types/tissues/samples (columns).
+
+ row_names : array-like
+ List containing the name of genes (rows).
- symmetric : boolean, default=True
- Whether generating a symmetric CCI matrix.
+ random_state : int, default=None
+ Seed for randomization.
- random_state : int, default=None
- Seed for randomization.
+ verbose : boolean, default=True
+ Whether printing or not steps of the analysis.
Returns
-------
- cci_matrix : pandas.DataFrame
- Matrix with rows and columns as cells. Values
- represent a random CCI score between 0 and 1.
+ df : pandas.DataFrame
+ Dataframe containing gene expression given the list
+ of genes for each cell-type/tissue/sample.
'''
- if labels is not None:
- assert len(labels) == cell_number, "Lenght of labels must match cell_number"
- else:
- labels = ['Cell-{}'.format(n) for n in range(1, cell_number+1)]
-
- if random_state is not None:
- np.random.seed(random_state)
- cci_scores = np.random.random((cell_number, cell_number))
- if symmetric:
- cci_scores = (cci_scores + cci_scores.T) / 2.
- cci_matrix = pd.DataFrame(cci_scores, index=labels, columns=labels)
-
- return cci_matrix
-
-
-
-
-
-
-
-
-
-
-
-
-
-generate_random_metadata(cell_labels, group_number)
-
-
-
-
-
-
- Randomly assigns groups to cell labels.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- cell_labels (list
) – A list of cell labels.
- group_number (int
) – Number of major groups of cells.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– DataFrame containing the major groups that each cell
-received randomly (under column 'Group'). Cells are
-under the column 'Cell'.
-
-
-
-
-
-
- Source code in cell2cell/datasets/random_data.py
- def generate_random_metadata(cell_labels, group_number):
- '''Randomly assigns groups to cell labels.
-
- Parameters
- ----------
- cell_labels : list
- A list of cell labels.
-
- group_number : int
- Number of major groups of cells.
-
- Returns
- -------
- metadata : pandas.DataFrame
- DataFrame containing the major groups that each cell
- received randomly (under column 'Group'). Cells are
- under the column 'Cell'.
- '''
- metadata = pd.DataFrame()
- metadata['Cell'] = cell_labels
-
- groups = list(range(1, group_number+1))
- metadata['Group'] = metadata['Cell'].apply(lambda x: np.random.choice(groups, 1)[0])
- return metadata
+ if verbose:
+ print('Generating random RNA-seq dataset.')
+ columns = ['Cell-{}'.format(c) for c in range(1, size+1)]
+
+ if random_state is not None:
+ np.random.seed(random_state)
+ data = np.random.randn(len(row_names), len(columns)) # Normal distribution
+ min = np.abs(np.amin(data, axis=1))
+ min = min.reshape((len(min), 1))
+
+ data = data + min
+ df = pd.DataFrame(data, index=row_names, columns=columns)
+ if verbose:
+ print('Normalizing random RNA-seq dataset (into TPM)')
+ df = rnaseq.scale_expression_by_sum(df, axis=0, sum_value=1e6)
+ return df
@@ -8711,12 +8146,12 @@
+
toy_data
-
+
@@ -8730,64 +8165,48 @@
-Functions
+
-
-generate_toy_rnaseq()
+
+generate_toy_distance()
-
+
- Generates a toy RNA-seq dataset
+ Generates a square matrix with cell-cell distance.
+Returns
+distance : pandas.DataFrame
+ DataFrame with Euclidean-like distance between each
+ pair of cells in the toy RNA-seq dataset.
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– DataFrame contianing the toy RNA-seq dataset. Columns
-are cells and rows are genes.
-
-
-
-
-
Source code in cell2cell/datasets/toy_data.py
- def generate_toy_rnaseq():
- '''Generates a toy RNA-seq dataset
+ def generate_toy_distance():
+ '''Generates a square matrix with cell-cell distance.
Returns
-------
- rnaseq : pandas.DataFrame
- DataFrame contianing the toy RNA-seq dataset. Columns
- are cells and rows are genes.
+ distance : pandas.DataFrame
+ DataFrame with Euclidean-like distance between each
+ pair of cells in the toy RNA-seq dataset.
'''
- data = np.asarray([[5, 10, 8, 15, 2],
- [15, 5, 20, 1, 30],
- [18, 12, 5, 40, 20],
- [9, 30, 22, 5, 2],
- [2, 1, 1, 27, 15],
- [30, 11, 16, 5, 12],
- ])
-
- rnaseq = pd.DataFrame(data,
- index=['Protein-A', 'Protein-B', 'Protein-C', 'Protein-D', 'Protein-E', 'Protein-F'],
- columns=['C1', 'C2', 'C3', 'C4', 'C5']
- )
- rnaseq.index.name = 'gene_id'
- return rnaseq
+ data = np.asarray([[0.0, 10.0, 12.0, 5.0, 3.0],
+ [10.0, 0.0, 15.0, 8.0, 9.0],
+ [12.0, 15.0, 0.0, 4.5, 7.5],
+ [5.0, 8.0, 4.5, 0.0, 6.5],
+ [3.0, 9.0, 7.5, 6.5, 0.0],
+ ])
+ distance = pd.DataFrame(data,
+ index=['C1', 'C2', 'C3', 'C4', 'C5'],
+ columns=['C1', 'C2', 'C3', 'C4', 'C5']
+ )
+ return distance
@@ -8800,60 +8219,80 @@
-generate_toy_ppi(prot_complex=False)
+
+generate_toy_metadata()
-
+
- Generates a toy list of protein-protein interactions.
+ Generates metadata for cells in the toy RNA-seq dataset.
+Returns
+metadata : pandas.DataFrame
+ DataFrame with metadata for each cell. Metadata contains the
+ major groups of those cells.
-
-
-
-
-
-
-
- Parameters:
-
-
- prot_complex (boolean, default=False
) – Whether including PPIs where interactors could contain
-multimeric complexes.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing PPIs. Columns are 'A' (first interacting
-partners), 'B' (second interacting partners) and 'score'
-for weighting each PPI.
-
-
-
-
-
Source code in cell2cell/datasets/toy_data.py
- def generate_toy_ppi(prot_complex=False):
- '''Generates a toy list of protein-protein interactions.
+ def generate_toy_metadata():
+ '''Generates metadata for cells in the toy RNA-seq dataset.
- Parameters
- ----------
- prot_complex : boolean, default=False
- Whether including PPIs where interactors could contain
+ Returns
+ -------
+ metadata : pandas.DataFrame
+ DataFrame with metadata for each cell. Metadata contains the
+ major groups of those cells.
+ '''
+ data = np.asarray([['C1', 'G1'],
+ ['C2', 'G2'],
+ ['C3', 'G3'],
+ ['C4', 'G3'],
+ ['C5', 'G1']
+ ])
+
+ metadata = pd.DataFrame(data, columns=['#SampleID', 'Groups'])
+ return metadata
+
+
+
+
+
+
+
+
+
+
+
+
+
+generate_toy_ppi(prot_complex=False)
+
+
+
+
+
+
+ Generates a toy list of protein-protein interactions.
+Parameters
+prot_complex : boolean, default=False
+ Whether including PPIs where interactors could contain
+ multimeric complexes.
+Returns
+ppi : pandas.DataFrame
+ Dataframe containing PPIs. Columns are 'A' (first interacting
+ partners), 'B' (second interacting partners) and 'score'
+ for weighting each PPI.
+
+
+ Source code in cell2cell/datasets/toy_data.py
+ def generate_toy_ppi(prot_complex=False):
+ '''Generates a toy list of protein-protein interactions.
+
+ Parameters
+ ----------
+ prot_complex : boolean, default=False
+ Whether including PPIs where interactors could contain
multimeric complexes.
Returns
@@ -8902,114 +8341,45 @@
-
-generate_toy_metadata()
-
-
-
-
-
-
- Generates metadata for cells in the toy RNA-seq dataset.
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– DataFrame with metadata for each cell. Metadata contains the
-major groups of those cells.
-
-
-
-
-
-
- Source code in cell2cell/datasets/toy_data.py
- def generate_toy_metadata():
- '''Generates metadata for cells in the toy RNA-seq dataset.
-
- Returns
- -------
- metadata : pandas.DataFrame
- DataFrame with metadata for each cell. Metadata contains the
- major groups of those cells.
- '''
- data = np.asarray([['C1', 'G1'],
- ['C2', 'G2'],
- ['C3', 'G3'],
- ['C4', 'G3'],
- ['C5', 'G1']
- ])
-
- metadata = pd.DataFrame(data, columns=['#SampleID', 'Groups'])
- return metadata
-
-
-
-
-
-
-
-
-
-
-
-
-
-generate_toy_distance()
+
+generate_toy_rnaseq()
-
+
- Generates a square matrix with cell-cell distance.
+ Generates a toy RNA-seq dataset
+Returns
+rnaseq : pandas.DataFrame
+ DataFrame contianing the toy RNA-seq dataset. Columns
+ are cells and rows are genes.
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– DataFrame with Euclidean-like distance between each
-pair of cells in the toy RNA-seq dataset.
-
-
-
-
-
Source code in cell2cell/datasets/toy_data.py
- def generate_toy_distance():
- '''Generates a square matrix with cell-cell distance.
+ def generate_toy_rnaseq():
+ '''Generates a toy RNA-seq dataset
Returns
-------
- distance : pandas.DataFrame
- DataFrame with Euclidean-like distance between each
- pair of cells in the toy RNA-seq dataset.
+ rnaseq : pandas.DataFrame
+ DataFrame contianing the toy RNA-seq dataset. Columns
+ are cells and rows are genes.
'''
- data = np.asarray([[0.0, 10.0, 12.0, 5.0, 3.0],
- [10.0, 0.0, 15.0, 8.0, 9.0],
- [12.0, 15.0, 0.0, 4.5, 7.5],
- [5.0, 8.0, 4.5, 0.0, 6.5],
- [3.0, 9.0, 7.5, 6.5, 0.0],
- ])
- distance = pd.DataFrame(data,
- index=['C1', 'C2', 'C3', 'C4', 'C5'],
- columns=['C1', 'C2', 'C3', 'C4', 'C5']
- )
- return distance
+ data = np.asarray([[5, 10, 8, 15, 2],
+ [15, 5, 20, 1, 30],
+ [18, 12, 5, 40, 20],
+ [9, 30, 22, 5, 2],
+ [2, 1, 1, 27, 15],
+ [30, 11, 16, 5, 12],
+ ])
+
+ rnaseq = pd.DataFrame(data,
+ index=['Protein-A', 'Protein-B', 'Protein-C', 'Protein-D', 'Protein-E', 'Protein-F'],
+ columns=['C1', 'C2', 'C3', 'C4', 'C5']
+ )
+ rnaseq.index.name = 'gene_id'
+ return rnaseq
@@ -9043,7 +8413,7 @@
- cell2cell.external
+ external
@@ -9052,7 +8422,7 @@
-
+
@@ -9066,18 +8436,18 @@
-Modules
+
-
+
goenrich
-
+
@@ -9091,92 +8461,119 @@
-
-
-
-
-
-EXPERIMENTAL_EVIDENCE
-
-
-
-
-
-
-
-
+
-
-GENE2GO_COLUMNS
+
+gene2go(filename, experimental=False, tax_id=9606, **kwds)
-
+
+ read go-annotation file
+:param filename: protein or gene identifier column
+:param experimental: use only experimentally validated annotations
+:param tax_id: filter according to taxon
+
+
+ Source code in cell2cell/external/goenrich.py
+ def gene2go(filename, experimental=False, tax_id=9606, **kwds):
+ """ read go-annotation file
+
+ :param filename: protein or gene identifier column
+ :param experimental: use only experimentally validated annotations
+ :param tax_id: filter according to taxon
+ """
+ defaults = {'comment': '#',
+ 'names': GENE2GO_COLUMNS}
+ defaults.update(kwds)
+ result = pd.read_csv(filename, sep='\t', **defaults)
+
+ retain_mask = result.tax_id == tax_id
+ result.drop(result.index[~retain_mask], inplace=True)
+
+ if experimental:
+ retain_mask = result.Evidence.isin(EXPERIMENTAL_EVIDENCE)
+ result.drop(result.index[~retain_mask], inplace=True)
+
+ return result
+
+
-
+
-
-GENE_ASSOCIATION_COLUMNS
+
+goa(filename, experimental=True, **kwds)
-
+
+ read go-annotation file
+:param filename: protein or gene identifier column
+:param experimental: use only experimentally validated annotations
+
+
+ Source code in cell2cell/external/goenrich.py
+ def goa(filename, experimental=True, **kwds):
+ """ read go-annotation file
+
+ :param filename: protein or gene identifier column
+ :param experimental: use only experimentally validated annotations
+ """
+ defaults = {'comment': '!',
+ 'names': GENE_ASSOCIATION_COLUMNS}
+
+ if experimental and 'usecols' in kwds:
+ kwds['usecols'] += ('evidence_code',)
+
+ defaults.update(kwds)
+ result = pd.read_csv(filename, sep='\t', **defaults)
+
+ if experimental:
+ retain_mask = result.evidence_code.isin(EXPERIMENTAL_EVIDENCE)
+ result.drop(result.index[~retain_mask], inplace=True)
+
+ return result
+
+
-Functions
-
-
+
ontology(file)
-
+
- read ontology from file
+ read ontology from file
+:param file: file path of file handle
-
-
-
-
-
-
-
- Parameters:
-
-
- file (None
) – file path of file handle
-
-
-
-
-
Source code in cell2cell/external/goenrich.py
def ontology(file):
@@ -9222,94 +8619,18 @@
-
-goa(filename, experimental=True, **kwds)
-
-
-
-
-
-
- read go-annotation file
-
-
-
-
-
-
-
-
- Parameters:
-
-
- filename (None
) – protein or gene identifier column
- experimental (None
) – use only experimentally validated annotations
-
-
-
-
-
-
- Source code in cell2cell/external/goenrich.py
- def goa(filename, experimental=True, **kwds):
- """ read go-annotation file
-
- :param filename: protein or gene identifier column
- :param experimental: use only experimentally validated annotations
- """
- defaults = {'comment': '!',
- 'names': GENE_ASSOCIATION_COLUMNS}
-
- if experimental and 'usecols' in kwds:
- kwds['usecols'] += ('evidence_code',)
-
- defaults.update(kwds)
- result = pd.read_csv(filename, sep='\t', **defaults)
-
- if experimental:
- retain_mask = result.evidence_code.isin(EXPERIMENTAL_EVIDENCE)
- result.drop(result.index[~retain_mask], inplace=True)
-
- return result
-
-
-
-
-
-
-
-
-
-
-
-
-
+
sgd(filename, experimental=False, **kwds)
-
+
- read yeast genome database go-annotation file
+ read yeast genome database go-annotation file
+:param filename: protein or gene identifier column
+:param experimental: use only experimentally validated annotations
-
-
-
-
-
-
-
- Parameters:
-
-
- filename (None
) – protein or gene identifier column
- experimental (None
) – use only experimentally validated annotations
-
-
-
-
-
Source code in cell2cell/external/goenrich.py
def sgd(filename, experimental=False, **kwds):
@@ -9326,76 +8647,14 @@
-
-
-gene2go(filename, experimental=False, tax_id=9606, **kwds)
+
+
-
-
-
-
- read go-annotation file
-
-
-
-
-
-
-
-
- Parameters:
-
-
- filename (None
) – protein or gene identifier column
- experimental (None
) – use only experimentally validated annotations
- tax_id (None
) – filter according to taxon
-
-
-
-
-
-
- Source code in cell2cell/external/goenrich.py
- def gene2go(filename, experimental=False, tax_id=9606, **kwds):
- """ read go-annotation file
-
- :param filename: protein or gene identifier column
- :param experimental: use only experimentally validated annotations
- :param tax_id: filter according to taxon
- """
- defaults = {'comment': '#',
- 'names': GENE2GO_COLUMNS}
- defaults.update(kwds)
- result = pd.read_csv(filename, sep='\t', **defaults)
-
- retain_mask = result.tax_id == tax_id
- result.drop(result.index[~retain_mask], inplace=True)
-
- if experimental:
- retain_mask = result.Evidence.isin(EXPERIMENTAL_EVIDENCE)
- result.drop(result.index[~retain_mask], inplace=True)
-
- return result
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
@@ -9403,12 +8662,12 @@
-
+
gseapy
-
+
@@ -9422,129 +8681,6 @@
-
-
-
-
-
-PATHWAY_DATA
-
-
-
-
-
-
-
-
-
-
-
-
-Functions
-
-
-
-
-
-
-load_gmt(filename, backup_url=None, readable_name=False)
-
-
-
-
-
-
- Load a GMT file.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- filename (str
) – Path to the GMT file.
- backup_url (str, default=None
) – URL to download the GMT file from if not present locally.
- readable_name (boolean, default=False
) – If True, the pathway names are transformed to a more readable format.
-That is, removing underscores and pathway DB name at the beginning.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– Dictionary with genes as keys and pathways as values.
-
-
-
-
-
-
- Source code in cell2cell/external/gseapy.py
- def load_gmt(filename, backup_url=None, readable_name=False):
- '''Load a GMT file.
-
- Parameters
- ----------
- filename : str
- Path to the GMT file.
-
- backup_url : str, default=None
- URL to download the GMT file from if not present locally.
-
- readable_name : boolean, default=False
- If True, the pathway names are transformed to a more readable format.
- That is, removing underscores and pathway DB name at the beginning.
-
- Returns
- -------
- pathway_per_gene : dict
- Dictionary with genes as keys and pathways as values.
- '''
- from pathlib import Path
-
- path = Path(filename)
- if path.is_file():
- f = open(path, 'rb')
- else:
- if backup_url is not None:
- try:
- _download(backup_url, path)
- except ValueError: # invalid URL
- print('Invalid filename or URL')
- f = open(path, 'rb')
- else:
- print('Invalid filename')
-
- pathway_per_gene = defaultdict(set)
- with f:
- for i, line in enumerate(f):
- l = line.decode("utf-8").split('\t')
- l[-1] = l[-1].replace('\n', '')
- l = [pw for pw in l if ('http' not in pw)] # Remove website info
- pathway_name = l[0]
- if readable_name:
- pathway_name = ' '.join(pathway_name.split('_')[1:])
- for gene in l[1:]:
- pathway_per_gene[gene] = pathway_per_gene[gene].union(set([pathway_name]))
- return pathway_per_gene
-
-
-
-
-
@@ -9552,62 +8688,47 @@
-
+
generate_lr_geneset(lr_list, complex_sep=None, lr_sep='^', pathway_per_gene=None, organism='human', pathwaydb='GOBP', min_pathways=15, max_pathways=10000, readable_name=False, output_folder=None)
-
+
Generate a gene set from a list of LR pairs.
+Parameters
+lr_list : list
+ List of LR pairs.
+complex_sep : str, default=None
+ Separator of the members of a complex. If None, the ligand and receptor are assumed to be single genes.
+lr_sep : str, default='^'
+ Separator of the ligand and receptor in the LR pair.
+pathway_per_gene : dict, default=None
+ Dictionary with genes as keys and pathways as values.
+ You can pass this if you are using different annotations than those
+ available resources in cell2cell.datasets.gsea_data.gsea_msig()
.
+organism : str, default='human'
+ Organism for whom the DB will be loaded.
+ Available options are {'human', 'mouse'}.
+
+str, default='GOBP'
+Molecular Signature Database to load.
+Available options are {'GOBP', 'KEGG', 'Reactome'}
+
+min_pathways : int, default=15
+ Minimum number of pathways that a LR pair can be annotated to.
+max_pathways : int, default=10000
+ Maximum number of pathways that a LR pair can be annotated to.
+readable_name : boolean, default=False
+ If True, the pathway names are transformed to a more readable format.
+output_folder : str, default=None
+ Path to store the GMT file. If None, it stores the gmt file in the
+ current directory.
+Returns
+lr_set : dict
+ Dictionary with pathways as keys and LR pairs as values.
-
-
-
-
-
-
-
- Parameters:
-
-
- lr_list (list
) – List of LR pairs.
- complex_sep (str, default=None
) – Separator of the members of a complex. If None, the ligand and receptor are assumed to be single genes.
- lr_sep (str, default='^'
) – Separator of the ligand and receptor in the LR pair.
- pathway_per_gene (dict, default=None
) – Dictionary with genes as keys and pathways as values.
-You can pass this if you are using different annotations than those
-available resources in cell2cell.datasets.gsea_data.gsea_msig()
.
- organism (str, default='human'
) – Organism for whom the DB will be loaded.
-Available options are {'human', 'mouse'}.
- pathwaydb (str, default='GOBP'
) – Molecular Signature Database to load.
-Available options are {'GOBP', 'KEGG', 'Reactome'}
- min_pathways (int, default=15
) – Minimum number of pathways that a LR pair can be annotated to.
- max_pathways (int, default=10000
) – Maximum number of pathways that a LR pair can be annotated to.
- readable_name (boolean, default=False
) – If True, the pathway names are transformed to a more readable format.
- output_folder (str, default=None
) – Path to store the GMT file. If None, it stores the gmt file in the
-current directory.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- dict
– Dictionary with pathways as keys and LR pairs as values.
-
-
-
-
-
Source code in cell2cell/external/gseapy.py
def generate_lr_geneset(lr_list, complex_sep=None, lr_sep='^', pathway_per_gene=None, organism='human', pathwaydb='GOBP',
@@ -9718,59 +8839,128 @@
-
+
+load_gmt(filename, backup_url=None, readable_name=False)
+
+
+
+
+
+
+ Load a GMT file.
+Parameters
+filename : str
+ Path to the GMT file.
+backup_url : str, default=None
+ URL to download the GMT file from if not present locally.
+readable_name : boolean, default=False
+ If True, the pathway names are transformed to a more readable format.
+ That is, removing underscores and pathway DB name at the beginning.
+Returns
+pathway_per_gene : dict
+ Dictionary with genes as keys and pathways as values.
+
+
+ Source code in cell2cell/external/gseapy.py
+ def load_gmt(filename, backup_url=None, readable_name=False):
+ '''Load a GMT file.
+
+ Parameters
+ ----------
+ filename : str
+ Path to the GMT file.
+
+ backup_url : str, default=None
+ URL to download the GMT file from if not present locally.
+
+ readable_name : boolean, default=False
+ If True, the pathway names are transformed to a more readable format.
+ That is, removing underscores and pathway DB name at the beginning.
+
+ Returns
+ -------
+ pathway_per_gene : dict
+ Dictionary with genes as keys and pathways as values.
+ '''
+ from pathlib import Path
+
+ path = Path(filename)
+ if path.is_file():
+ f = open(path, 'rb')
+ else:
+ if backup_url is not None:
+ try:
+ _download(backup_url, path)
+ except ValueError: # invalid URL
+ print('Invalid filename or URL')
+ f = open(path, 'rb')
+ else:
+ print('Invalid filename')
+
+ pathway_per_gene = defaultdict(set)
+ with f:
+ for i, line in enumerate(f):
+ l = line.decode("utf-8").split('\t')
+ l[-1] = l[-1].replace('\n', '')
+ l = [pw for pw in l if ('http' not in pw)] # Remove website info
+ pathway_name = l[0]
+ if readable_name:
+ pathway_name = ' '.join(pathway_name.split('_')[1:])
+ for gene in l[1:]:
+ pathway_per_gene[gene] = pathway_per_gene[gene].union(set([pathway_name]))
+ return pathway_per_gene
+
+
+
+
+
+
+
+
+
+
+
+
+
run_gsea(loadings, lr_set, output_folder, weight=1, min_size=15, permutations=999, processes=6, random_state=6, significance_threshold=0.05)
-
+
Run GSEA using the LR gene set.
+Parameters
+loadings : pandas.DataFrame
+ Dataframe with the loadings of the LR pairs for each factor.
+lr_set : dict
+ Dictionary with pathways as keys and LR pairs as values.
+ LR pairs must match the indexes in the loadings dataframe.
+output_folder : str
+ Path to the output folder.
+weight : int, default=1
+ Weight to use for score underlying the GSEA (parameter p).
+min_size : int, default=15
+ Minimum number of LR pairs that a pathway must contain.
+permutations : int, default=999
+ Number of permutations to use for the GSEA. The total permutations
+ will be this number plus 1 (this extra case is the unpermuted one).
+processes : int, default=6
+ Number of processes to use for the GSEA.
+random_state : int, default=6
+ Random seed to use for the GSEA.
+significance_threshold : float, default=0.05
+ Significance threshold to use for the FDR correction.
+Returns
+pvals : pandas.DataFrame
+ Dataframe containing the P-values for each pathway (rows)
+ in each of the factors (columns).
+score : pandas.DataFrame
+ Dataframe containing the Normalized Enrichment Scores (NES)
+ for each pathway (rows) in each of the factors (columns).
+gsea_df : pandas.DataFrame
+ Dataframe with the detailed GSEA results.
-
-
-
-
-
-
-
- Parameters:
-
-
- loadings (pandas.DataFrame
) – Dataframe with the loadings of the LR pairs for each factor.
- lr_set (dict
) – Dictionary with pathways as keys and LR pairs as values.
-LR pairs must match the indexes in the loadings dataframe.
- output_folder (str
) – Path to the output folder.
- weight (int, default=1
) – Weight to use for score underlying the GSEA (parameter p).
- min_size (int, default=15
) – Minimum number of LR pairs that a pathway must contain.
- permutations (int, default=999
) – Number of permutations to use for the GSEA. The total permutations
-will be this number plus 1 (this extra case is the unpermuted one).
- processes (int, default=6
) – Number of processes to use for the GSEA.
- random_state (int, default=6
) – Random seed to use for the GSEA.
- significance_threshold (float, default=0.05
) – Significance threshold to use for the FDR correction.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing the P-values for each pathway (rows)
-in each of the factors (columns).
-
-
-
-
-
Source code in cell2cell/external/gseapy.py
def run_gsea(loadings, lr_set, output_folder, weight=1, min_size=15, permutations=999, processes=6,
@@ -9901,12 +9091,12 @@
-
+
pcoa
-
+
@@ -9920,22 +9110,22 @@
-Functions
+
-
+
pcoa(distance_matrix, method='eigh', number_of_dimensions=0, inplace=False)
-
+
- Perform Principal Coordinate Analysis.
-Principal Coordinate Analysis (PCoA) is a method similar
+
Perform Principal Coordinate Analysis.
+Principal Coordinate Analysis (PCoA) is a method similar
to Principal Components Analysis (PCA) with the difference that PCoA
operates on distance matrices, typically with non-euclidian and thus
ecologically meaningful distances like UniFrac in microbiome research.
@@ -9948,63 +9138,55 @@
the other, or too low in both, etc. On the other hand, if an
species is present in two sites, that means that the sites are
similar.).
-Note that the returned eigenvectors are not normalized to unit length.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- distance_matrix (pandas.DataFrame
) – A distance matrix.
- method (str
) – Eigendecomposition method to use in performing PCoA.
-By default, uses SciPy's eigh
, which computes exact
-eigenvectors and eigenvalues for all dimensions. The alternate
-method, fsvd
, uses faster heuristic eigendecomposition but loses
-accuracy. The magnitude of accuracy lost is dependent on dataset.
- number_of_dimensions (int
) – Dimensions to reduce the distance matrix to. This number determines
-how many eigenvectors and eigenvalues will be returned.
-By default, equal to the number of dimensions of the distance matrix,
-as default eigendecomposition using SciPy's eigh
method computes
-all eigenvectors and eigenvalues. If using fast heuristic
-eigendecomposition through fsvd
, a desired number of dimensions
-should be specified. Note that the default eigendecomposition
-method eigh
does not natively support a specifying number of
-dimensions to reduce a matrix to, so if this parameter is specified,
-all eigenvectors and eigenvalues will be simply be computed with no
-speed gain, and only the number specified by number_of_dimensions
-will be returned. Specifying a value of 0
, the default, will
-set number_of_dimensions
equal to the number of dimensions of the
-specified distance_matrix
.
- inplace (bool
) – If true, centers a distance matrix in-place in a manner that reduces
-memory consumption.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- OrdinationResults
– Object that stores the PCoA results, including eigenvalues, the
-proportion explained by each of them, and transformed sample
-coordinates.
-
-
-
-
-
+Note that the returned eigenvectors are not normalized to unit length.
+Parameters
+
+distance_matrix : pandas.DataFrame
+ A distance matrix.
+method : str, optional
+ Eigendecomposition method to use in performing PCoA.
+ By default, uses SciPy's eigh
, which computes exact
+ eigenvectors and eigenvalues for all dimensions. The alternate
+ method, fsvd
, uses faster heuristic eigendecomposition but loses
+ accuracy. The magnitude of accuracy lost is dependent on dataset.
+number_of_dimensions : int, optional
+ Dimensions to reduce the distance matrix to. This number determines
+ how many eigenvectors and eigenvalues will be returned.
+ By default, equal to the number of dimensions of the distance matrix,
+ as default eigendecomposition using SciPy's eigh
method computes
+ all eigenvectors and eigenvalues. If using fast heuristic
+ eigendecomposition through fsvd
, a desired number of dimensions
+ should be specified. Note that the default eigendecomposition
+ method eigh
does not natively support a specifying number of
+ dimensions to reduce a matrix to, so if this parameter is specified,
+ all eigenvectors and eigenvalues will be simply be computed with no
+ speed gain, and only the number specified by number_of_dimensions
+ will be returned. Specifying a value of 0
, the default, will
+ set number_of_dimensions
equal to the number of dimensions of the
+ specified distance_matrix
.
+inplace : bool, optional
+ If true, centers a distance matrix in-place in a manner that reduces
+ memory consumption.
+Returns
+
+OrdinationResults
+ Object that stores the PCoA results, including eigenvalues, the
+ proportion explained by each of them, and transformed sample
+ coordinates.
+See Also
+
+OrdinationResults
+Notes
+
+.. note:: If the distance is not euclidean (for example if it is a
+ semimetric and the triangle inequality doesn't hold),
+ negative eigenvalues can appear. There are different ways
+ to deal with that problem (see Legendre & Legendre 1998, \S
+ 9.2.3), but none are currently implemented here.
+ However, a warning is raised whenever negative eigenvalues
+ appear, allowing the user to decide if they can be safely
+ ignored.
+
Source code in cell2cell/external/pcoa.py
def pcoa(distance_matrix, method="eigh", number_of_dimensions=0,
@@ -10203,55 +9385,35 @@
-
+
pcoa_biplot(ordination, y)
-
+
- Compute the projection of descriptors into a PCoA matrix
-This implementation is as described in Chapter 9 of Legendre & Legendre,
-Numerical Ecology 3rd edition.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- ordination (OrdinationResults
) – The computed principal coordinates analysis of dimensions (n, c) where
-the matrix y
will be projected onto.
- y (DataFrame
) – Samples by features table of dimensions (n, m). These can be
+
Compute the projection of descriptors into a PCoA matrix
+This implementation is as described in Chapter 9 of Legendre & Legendre,
+Numerical Ecology 3rd edition.
+Parameters
+
+
+OrdinationResults
+The computed principal coordinates analysis of dimensions (n, c) where
+the matrix y
will be projected onto.
+
+
+DataFrame
+Samples by features table of dimensions (n, m). These can be
environmental features or abundance counts. This table should be
-normalized in cases of dimensionally heterogenous physical variables.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- OrdinationResults
– The modified input object that includes projected features onto the
-ordination space in the features
attribute.
-
-
-
-
-
+normalized in cases of dimensionally heterogenous physical variables.
+
+Returns
+
OrdinationResults
+ The modified input object that includes projected features onto the
+ ordination space in the features
attribute.
+
Source code in cell2cell/external/pcoa.py
def pcoa_biplot(ordination, y):
@@ -10332,12 +9494,12 @@
-
+
pcoa_utils
-
+
@@ -10351,89 +9513,269 @@
-Functions
+
-
-mean_and_std(a, axis=None, weights=None, with_mean=True, with_std=True, ddof=0)
+
+center_distance_matrix(distance_matrix, inplace=False)
-
-
-
-
- Compute the weighted average and standard deviation along the
-specified axis.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- a (array_like
) – Calculate average and standard deviation of these values.
- axis (int
) – Axis along which the statistics are computed. The default is
-to compute them on the flattened array.
- weights (array_like
) – An array of weights associated with the values in a
. Each
-value in a
contributes to the average according to its
-associated weight. The weights array can either be 1-D (in
-which case its length must be the size of a
along the given
-axis) or of the same shape as a
. If weights=None
, then all
-data in a
are assumed to have a weight equal to one.
- with_mean (bool, optional, defaults to True
) – Compute average if True.
- with_std (bool, optional, defaults to True
) – Compute standard deviation if True.
- ddof (int, optional, defaults to 0
) – It means delta degrees of freedom. Variance is calculated by
-dividing by n - ddof
(where n
is the number of
-elements). By default it computes the maximum likelyhood
-estimator.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- average, std
– Return the average and standard deviation along the specified
-axis. If any of them was not required, returns None
instead
-
-
-
-
-
+
+
+
+
+ Centers a distance matrix.
+Note: If the used distance was euclidean, pairwise distances
+needn't be computed from the data table Y because F_matrix =
+Y.dot(Y.T) (if Y has been centered).
+But since we're expecting distance_matrix to be non-euclidian,
+we do the following computation as per
+Numerical Ecology (Legendre & Legendre 1998).
+Parameters
+
+distance_matrix : 2D array_like
+ Distance matrix.
+inplace : bool, optional
+ Whether or not to center the given distance matrix in-place, which
+ is more efficient in terms of memory and computation.
+
Source code in cell2cell/external/pcoa_utils.py
- def mean_and_std(a, axis=None, weights=None, with_mean=True, with_std=True,
- ddof=0):
- """Compute the weighted average and standard deviation along the
- specified axis.
- Parameters
- ----------
- a : array_like
- Calculate average and standard deviation of these values.
- axis : int, optional
- Axis along which the statistics are computed. The default is
- to compute them on the flattened array.
- weights : array_like, optional
- An array of weights associated with the values in `a`. Each
- value in `a` contributes to the average according to its
- associated weight. The weights array can either be 1-D (in
- which case its length must be the size of `a` along the given
- axis) or of the same shape as `a`. If `weights=None`, then all
+ def center_distance_matrix(distance_matrix, inplace=False):
+ """
+ Centers a distance matrix.
+ Note: If the used distance was euclidean, pairwise distances
+ needn't be computed from the data table Y because F_matrix =
+ Y.dot(Y.T) (if Y has been centered).
+ But since we're expecting distance_matrix to be non-euclidian,
+ we do the following computation as per
+ Numerical Ecology (Legendre & Legendre 1998).
+ Parameters
+ ----------
+ distance_matrix : 2D array_like
+ Distance matrix.
+ inplace : bool, optional
+ Whether or not to center the given distance matrix in-place, which
+ is more efficient in terms of memory and computation.
+ """
+ if inplace:
+ return _f_matrix_inplace(_e_matrix_inplace(distance_matrix))
+ else:
+ return f_matrix(e_matrix(distance_matrix))
+
+
+
+
+
+
+
+
+
+
+
+
+
+corr(x, y=None)
+
+
+
+
+
+
+ Computes correlation between columns of x
, or x
and y
.
+Correlation is covariance of (columnwise) standardized matrices,
+so each matrix is first centered and scaled to have variance one,
+and then their covariance is computed.
+Parameters
+
+x : 2D array_like
+ Matrix of shape (n, p). Correlation between its columns will
+ be computed.
+y : 2D array_like, optional
+ Matrix of shape (n, q). If provided, the correlation is
+ computed between the columns of x
and the columns of
+ y
. Else, it's computed between the columns of x
.
+Returns
+
+correlation
+ Matrix of computed correlations. Has shape (p, p) if y
is
+ not provided, else has shape (p, q).
+
+
+ Source code in cell2cell/external/pcoa_utils.py
+ def corr(x, y=None):
+ """Computes correlation between columns of `x`, or `x` and `y`.
+ Correlation is covariance of (columnwise) standardized matrices,
+ so each matrix is first centered and scaled to have variance one,
+ and then their covariance is computed.
+ Parameters
+ ----------
+ x : 2D array_like
+ Matrix of shape (n, p). Correlation between its columns will
+ be computed.
+ y : 2D array_like, optional
+ Matrix of shape (n, q). If provided, the correlation is
+ computed between the columns of `x` and the columns of
+ `y`. Else, it's computed between the columns of `x`.
+ Returns
+ -------
+ correlation
+ Matrix of computed correlations. Has shape (p, p) if `y` is
+ not provided, else has shape (p, q).
+ """
+ x = np.asarray(x)
+ if y is not None:
+ y = np.asarray(y)
+ if y.shape[0] != x.shape[0]:
+ raise ValueError("Both matrices must have the same number of rows")
+ x, y = scale(x), scale(y)
+ else:
+ x = scale(x)
+ y = x
+ # Notice that scaling was performed with ddof=0 (dividing by n,
+ # the default), so now we need to remove it by also using ddof=0
+ # (dividing by n)
+ return x.T.dot(y) / x.shape[0]
+
+
+
+
+
+
+
+
+
+
+
+
+
+e_matrix(distance_matrix)
+
+
+
+
+
+
+ Compute E matrix from a distance matrix.
+Squares and divides by -2 the input elementwise. Eq. 9.20 in
+Legendre & Legendre 1998.
+
+
+
+
+
+
+
+
+
+
+
+
+
+f_matrix(E_matrix)
+
+
+
+
+
+
+ Compute F matrix from E matrix.
+Centring step: for each element, the mean of the corresponding
+row and column are substracted, and the mean of the whole
+matrix is added. Eq. 9.21 in Legendre & Legendre 1998.
+
+
+ Source code in cell2cell/external/pcoa_utils.py
+ def f_matrix(E_matrix):
+ """Compute F matrix from E matrix.
+ Centring step: for each element, the mean of the corresponding
+ row and column are substracted, and the mean of the whole
+ matrix is added. Eq. 9.21 in Legendre & Legendre 1998."""
+ row_means = E_matrix.mean(axis=1, keepdims=True)
+ col_means = E_matrix.mean(axis=0, keepdims=True)
+ matrix_mean = E_matrix.mean()
+ return E_matrix - row_means - col_means + matrix_mean
+
+
+
+
+
+
+
+
+
+
+
+
+
+mean_and_std(a, axis=None, weights=None, with_mean=True, with_std=True, ddof=0)
+
+
+
+
+
+
+ Compute the weighted average and standard deviation along the
+specified axis.
+Parameters
+
+a : array_like
+ Calculate average and standard deviation of these values.
+axis : int, optional
+ Axis along which the statistics are computed. The default is
+ to compute them on the flattened array.
+weights : array_like, optional
+ An array of weights associated with the values in a
. Each
+ value in a
contributes to the average according to its
+ associated weight. The weights array can either be 1-D (in
+ which case its length must be the size of a
along the given
+ axis) or of the same shape as a
. If weights=None
, then all
+ data in a
are assumed to have a weight equal to one.
+with_mean : bool, optional, defaults to True
+ Compute average if True.
+with_std : bool, optional, defaults to True
+ Compute standard deviation if True.
+ddof : int, optional, defaults to 0
+ It means delta degrees of freedom. Variance is calculated by
+ dividing by n - ddof
(where n
is the number of
+ elements). By default it computes the maximum likelyhood
+ estimator.
+Returns
+
+average, std
+ Return the average and standard deviation along the specified
+ axis. If any of them was not required, returns None
instead
+
+
+ Source code in cell2cell/external/pcoa_utils.py
+ def mean_and_std(a, axis=None, weights=None, with_mean=True, with_std=True,
+ ddof=0):
+ """Compute the weighted average and standard deviation along the
+ specified axis.
+ Parameters
+ ----------
+ a : array_like
+ Calculate average and standard deviation of these values.
+ axis : int, optional
+ Axis along which the statistics are computed. The default is
+ to compute them on the flattened array.
+ weights : array_like, optional
+ An array of weights associated with the values in `a`. Each
+ value in `a` contributes to the average according to its
+ associated weight. The weights array can either be 1-D (in
+ which case its length must be the size of `a` along the given
+ axis) or of the same shape as `a`. If `weights=None`, then all
data in `a` are assumed to have a weight equal to one.
with_mean : bool, optional, defaults to True
Compute average if True.
@@ -10492,59 +9834,44 @@
-
+
scale(a, weights=None, with_mean=True, with_std=True, ddof=0, copy=True)
-
-
-
-
- Scale array by columns to have weighted average 0 and standard
-deviation 1.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- a (array_like
) – 2D array whose columns are standardized according to the
-weights.
- weights (array_like
) – Array of weights associated with the columns of a
. By
-default, the scaling is unweighted.
- with_mean (bool, optional, defaults to True
) – Center columns to have 0 weighted mean.
- with_std (bool, optional, defaults to True
) – Scale columns to have unit weighted std.
- ddof (int, optional, defaults to 0
) – If with_std is True, variance is calculated by dividing by n
-- ddof
(where n
is the number of elements). By default it
-computes the maximum likelyhood stimator.
- copy (bool, optional, defaults to True
) – Whether to perform the standardization in place, or return a
-new copy of a
.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- 2D ndarray
– Scaled array.
-
-
-
-
-
+
+
+
+
+ Scale array by columns to have weighted average 0 and standard
+deviation 1.
+Parameters
+
+a : array_like
+ 2D array whose columns are standardized according to the
+ weights.
+weights : array_like, optional
+ Array of weights associated with the columns of a
. By
+ default, the scaling is unweighted.
+with_mean : bool, optional, defaults to True
+ Center columns to have 0 weighted mean.
+with_std : bool, optional, defaults to True
+ Scale columns to have unit weighted std.
+ddof : int, optional, defaults to 0
+ If with_std is True, variance is calculated by dividing by n
+ - ddof
(where n
is the number of elements). By default it
+ computes the maximum likelyhood stimator.
+copy : bool, optional, defaults to True
+ Whether to perform the standardization in place, or return a
+ new copy of a
.
+Returns
+
+2D ndarray
+ Scaled array.
+Notes
+
+Wherever std equals 0, it is replaced by 1 in order to avoid
+division by zero.
+
Source code in cell2cell/external/pcoa_utils.py
def scale(a, weights=None, with_mean=True, with_std=True, ddof=0, copy=True):
@@ -10601,16 +9928,16 @@
-
+
svd_rank(M_shape, S, tol=None)
-
+
- Matrix rank of M
given its singular values S
.
-See np.linalg.matrix_rank
for a rationale on the tolerance
+
Matrix rank of M
given its singular values S
.
+See np.linalg.matrix_rank
for a rationale on the tolerance
(we're not using that function because it doesn't let us reuse a
precomputed SVD).
@@ -10632,167 +9959,39 @@
-
-
-
-
-
-corr(x, y=None)
-
-
+
- Computes correlation between columns of x
, or x
and y
.
-Correlation is covariance of (columnwise) standardized matrices,
-so each matrix is first centered and scaled to have variance one,
-and then their covariance is computed.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- x (2D array_like
) – Matrix of shape (n, p). Correlation between its columns will
-be computed.
- y (2D array_like
) – Matrix of shape (n, q). If provided, the correlation is
-computed between the columns of x
and the columns of
-y
. Else, it's computed between the columns of x
.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- correlation
– Matrix of computed correlations. Has shape (p, p) if y
is
-not provided, else has shape (p, q).
-
-
-
-
-
-
- Source code in cell2cell/external/pcoa_utils.py
- def corr(x, y=None):
- """Computes correlation between columns of `x`, or `x` and `y`.
- Correlation is covariance of (columnwise) standardized matrices,
- so each matrix is first centered and scaled to have variance one,
- and then their covariance is computed.
- Parameters
- ----------
- x : 2D array_like
- Matrix of shape (n, p). Correlation between its columns will
- be computed.
- y : 2D array_like, optional
- Matrix of shape (n, q). If provided, the correlation is
- computed between the columns of `x` and the columns of
- `y`. Else, it's computed between the columns of `x`.
- Returns
- -------
- correlation
- Matrix of computed correlations. Has shape (p, p) if `y` is
- not provided, else has shape (p, q).
- """
- x = np.asarray(x)
- if y is not None:
- y = np.asarray(y)
- if y.shape[0] != x.shape[0]:
- raise ValueError("Both matrices must have the same number of rows")
- x, y = scale(x), scale(y)
- else:
- x = scale(x)
- y = x
- # Notice that scaling was performed with ddof=0 (dividing by n,
- # the default), so now we need to remove it by also using ddof=0
- # (dividing by n)
- return x.T.dot(y) / x.shape[0]
-
-
-
+
-
-e_matrix(distance_matrix)
+
+ umap
+
-
+
- Compute E matrix from a distance matrix.
-Squares and divides by -2 the input elementwise. Eq. 9.20 in
-Legendre & Legendre 1998.
-
-
-
-
-
-
-
-
-
-
-
-f_matrix(E_matrix)
+
-
-
- Compute F matrix from E matrix.
-Centring step: for each element, the mean of the corresponding
-row and column are substracted, and the mean of the whole
-matrix is added. Eq. 9.21 in Legendre & Legendre 1998.
-
- Source code in cell2cell/external/pcoa_utils.py
- def f_matrix(E_matrix):
- """Compute F matrix from E matrix.
- Centring step: for each element, the mean of the corresponding
- row and column are substracted, and the mean of the whole
- matrix is added. Eq. 9.21 in Legendre & Legendre 1998."""
- row_means = E_matrix.mean(axis=1, keepdims=True)
- col_means = E_matrix.mean(axis=0, keepdims=True)
- matrix_mean = E_matrix.mean()
- return E_matrix - row_means - col_means + matrix_mean
-
-
-
-
@@ -10800,177 +9999,57 @@
-
-center_distance_matrix(distance_matrix, inplace=False)
-
-
-
-
-
-
- Centers a distance matrix.
-Note: If the used distance was euclidean, pairwise distances
-needn't be computed from the data table Y because F_matrix =
-Y.dot(Y.T) (if Y has been centered).
-But since we're expecting distance_matrix to be non-euclidian,
-we do the following computation as per
-Numerical Ecology (Legendre & Legendre 1998).
-
-
-
-
-
-
-
-
- Parameters:
-
-
- distance_matrix (2D array_like
) – Distance matrix.
- inplace (bool
) – Whether or not to center the given distance matrix in-place, which
-is more efficient in terms of memory and computation.
-
-
-
-
-
-
- Source code in cell2cell/external/pcoa_utils.py
- def center_distance_matrix(distance_matrix, inplace=False):
- """
- Centers a distance matrix.
- Note: If the used distance was euclidean, pairwise distances
- needn't be computed from the data table Y because F_matrix =
- Y.dot(Y.T) (if Y has been centered).
- But since we're expecting distance_matrix to be non-euclidian,
- we do the following computation as per
- Numerical Ecology (Legendre & Legendre 1998).
- Parameters
- ----------
- distance_matrix : 2D array_like
- Distance matrix.
- inplace : bool, optional
- Whether or not to center the given distance matrix in-place, which
- is more efficient in terms of memory and computation.
- """
- if inplace:
- return _f_matrix_inplace(_e_matrix_inplace(distance_matrix))
- else:
- return f_matrix(e_matrix(distance_matrix))
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- umap
-
+
+run_umap(rnaseq_data, axis=1, metric='euclidean', min_dist=0.4, n_neighbors=8, random_state=None, **kwargs)
-
-
-
-
-
-
-
-
-
-
-Functions
-
-
-
-
-
-
-run_umap(rnaseq_data, axis=1, metric='euclidean', min_dist=0.4, n_neighbors=8, random_state=None, **kwargs)
-
-
-
-
-
-
- Runs UMAP on a expression matrix.
-
-
-
-
-
-
-
-
- Parameters:
-
-
- rnaseq_data (pandas.DataFrame
) – A dataframe of gene expression values wherein the rows are the genes or
-embeddings of a dimensionality reduction method and columns the cells,
-tissues or samples.
- axis (int, default=0
) – An axis of the dataframe (0 across rows, 1 across columns).
-Across rows means that the UMAP is to compare genes, while
-across columns is to compare cells, tissues or samples.
- metric (str, default='euclidean'
) – The distance metric to use. The distance function can be 'braycurtis',
-'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
-'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski',
-'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
-'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
- min_dist (float, default=0.4
) – The effective minimum distance between embedded points. Smaller values
+
Runs UMAP on a expression matrix.
+Parameters
+
+rnaseq_data : pandas.DataFrame
+ A dataframe of gene expression values wherein the rows are the genes or
+ embeddings of a dimensionality reduction method and columns the cells,
+ tissues or samples.
+axis : int, default=0
+ An axis of the dataframe (0 across rows, 1 across columns).
+ Across rows means that the UMAP is to compare genes, while
+ across columns is to compare cells, tissues or samples.
+metric : str, default='euclidean'
+ The distance metric to use. The distance function can be 'braycurtis',
+ 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine', 'dice',
+ 'euclidean', 'hamming', 'jaccard', 'jensenshannon', 'kulsinski',
+ 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao',
+ 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+
+float, default=0.4
+The effective minimum distance between embedded points. Smaller values
will result in a more clustered/clumped embedding where nearby points
on the manifold are drawn closer together, while larger values will
result on a more even dispersal of points. The value should be set
relative to the spread
value, which determines the scale at which
-embedded points will be spread out.
- n_neighbors (float, default=8
) – The size of local neighborhood (in terms of number of neighboring
+embedded points will be spread out.
+
+
+float, default=8
+The size of local neighborhood (in terms of number of neighboring
sample points) used for manifold approximation. Larger values
result in more global views of the manifold, while smaller
values result in more local data being preserved. In general
-values should be in the range 2 to 100.
- random_state (int, default=None
) – Seed for randomization.
- *kwargs* (dict
) – Extra arguments for UMAP as defined in umap.UMAP.
-
-
-
-
-
-
-
-
-
-
-
-
- Returns:
-
-
- pandas.DataFrame
– Dataframe containing the UMAP embeddings for the axis analyzed.
-Contains columns 'umap1 and 'umap2'.
-
-
-
-
-
+values should be in the range 2 to 100.
+
+random_state : int, default=None
+ Seed for randomization.
+**kwargs : dict
+ Extra arguments for UMAP as defined in umap.UMAP.
+Returns
+umap_df : pandas.DataFrame
+ Dataframe containing the UMAP embeddings for the axis analyzed.
+ Contains columns 'umap1 and 'umap2'.
+
Source code in cell2cell/external/umap.py