Need help getting DEGs between case group and control group for each cluster #371
Unanswered
jeanne09jiang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I'm trying to do a DEG analysis between my two mutant samples and two control samples for each of the cluster. It is not clear on the Stereopy tutorial how to do it on multiple samples. Could anyone help me with a solution?
My approach is after clustering, I defined the samples as control or mutant and add the information to metadata and ms_data. Then I used the function "ms_data.tl.find_marker_genes" trying to find the DEGs between control and mutant (case) group for each cluster.
Here are the preprocessing, normalization and clustering steps.
ms_data.integrate()
ms_data.tl.normalize_total(scope=slice_generator[:], mode='integrate')
ms_data.tl.log1p(scope=slice_generator[:], mode='integrate')
ms_data.tl.highly_variable_genes(scope=slice_generator[:],mode='integrate',min_mean=0.0125, max_mean=3,min_disp=0.5, n_top_genes=2000, res_key='highly_variable_genes')
ms_data.tl.scale(scope=slice_generator[:],mode='integrate', zero_center=False, max_value=10)
ms_data.tl.pca(scope=slice_generator[:], mode='integrate', use_highly_genes=False, n_pcs=30, res_key='pca')
ms_data.tl.batches_integrate(scope=slice_generator[:], mode='integrate', pca_res_key='pca', res_key='pca_integrated')
ms_data.tl.neighbors(scope=slice_generator[:], mode='integrate', pca_res_key='pca_integrated', n_pcs=30, res_key='neighbors_integrated', n_jobs=-1)
ms_data.tl.umap(scope=slice_generator[:], mode='integrate', pca_res_key='pca_integrated', neighbors_res_key='neighbors_integrated', res_key='umap_integrated')
ms_data.plt.batches_umap(scope=slice_generator[:], mode='integrate', res_key='umap_integrated')
ms_data.tl.leiden(scope=slice_generator[:], mode='integrate', neighbors_res_key='neighbors_integrated',
resolution=0.4, res_key='leiden0.4')
ms_data.tl.find_marker_genes(cluster_res_key='leiden0.4', use_highly_genes=False, use_raw =False,
res_key='marker_genes')
Define sample groups
control_samples = ['1', '2']
mutant_samples = ['0', '3']
Create a metadata label for sample groups
sample_labels = ms_data.obs['batch'].copy()
print(sample_labels)
sample_labels = sample_labels.replace(control_samples, 'control').replace(mutant_samples, 'mutant')
print(sample_labels)
Add the sample group label to ms_data
ms_data.obs['condition'] = sample_labels
Perform Differential Expression Analysis (DEG)
ms_data.tl.find_marker_genes(
cluster_res_key='condition', # Compare "control" vs "mutant"
method='wilcoxon', # Use Wilcoxon test
case_groups=['mutant'],
control_groups=['control'],
)
This is how my ms_data looks like after adding a 'condition' column containing information of control or mutant to ms_data:
ms_data: {'mutant_10': (20961, 25679), 'control_6': (21836, 26539), 'mutant_11': (26460, 25970), 'control_5': (19404, 26042)}
num_slice: 4
names: ['mutant_10', 'control_6', 'mutant_11', 'control_5']
merged_data: id(140144293829840)
obs: ['batch', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden0.4', 'condition']
var: ['n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'mean_bin', 'dispersions_norm', 'highly_variable']
relationship: other
var_type: intersect to 20474
current_mode: integrate
current_scope: scope_[0,1,2,3]
scopes_data: ['scope_[0,1,2,3]:id(140144293829840)']
mss: ["scope_[0,1,2,3]:['highly_variable_genes', 'pca', 'pca_variance_ratio', 'pca_integrated', 'neighbors_integrated', 'umap_integrated', 'leiden0.4', 'gene_exp_leiden0.4']"]
But it failed with the following error message:
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/_utils.py", line 72, in call
return self.func(**kwargs)
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py", line 598, in call
return [func(*args, **kwargs)
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py", line 598, in
return [func(*args, **kwargs)
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/tools/find_markers.py", line 169, in handle_result
result = statistics.wilcoxon(
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/algorithm/statistics.py", line 69, in wilcoxon
s, p = mannwhitneyu(group, other_group, ranks=ranks, tie_term=tie_term, x_mask=x_mask)
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/algorithm/mannwhitneyu.py", line 251, in mannwhitneyu
_mwu_input_validation(x, y, use_continuity, alternative, axis, method))
File "/home/zhj869/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/algorithm/mannwhitneyu.py", line 140, in _mwu_input_validation
if np.isnan(x).any() or np.isnan(y).any():
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
"""
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
Cell In[36], line 2
1 # Perform Differential Expression Analysis (DEG)
----> 2 ms_data.tl.find_marker_genes(
3 cluster_res_key='condition', # Compare "control" vs "mutant"
4 method='wilcoxon', # Use Wilcoxon test
5 case_groups=['mutant'],
6 control_groups=['control'],
7 )
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:223, in MSDataPipeLine.getattr..temp(*args, **kwargs)
220 kwargs["mode"] = self.__mode
222 if kwargs["mode"] == "integrate":
--> 223 return self._use_integrate_method(item, *args, **kwargs)
224 elif kwargs["mode"] == "isolated":
225 self._run_isolated_method(item, *args, **kwargs)
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/core/ms_pipeline.py:136, in MSDataPipeLine._use_integrate_method(self, item, *args, **kwargs)
133 return new_attr(*args, **kwargs)
135 logger.info(f'data_obj(idx=0) in ms_data start to run {item}')
--> 136 return new_attr(
137 ms_data_view.merged_data.getattribute(self.class.ATTR_NAME),
138 *args,
139 **kwargs
140 )
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/core/st_pipeline.py:43, in logit..wrapped(*args, **kwargs)
41 logger.info('start to run {}...'.format(func.name))
42 tk = tc.start()
---> 43 res = func(*args, **kwargs)
44 logger.info('{} end, consume time {:.4f}s.'.format(func.name, tc.get_time_consumed(key=tk, restart=False)))
45 return res
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/core/st_pipeline.py:1129, in StPipeline.find_marker_genes(self, cluster_res_key, method, case_groups, control_groups, corr_method, use_raw, use_highly_genes, hvg_res_key, res_key, output, sort_by, n_genes, ascending, n_jobs)
1126 pct, pct_rest = calc_pct_and_pct_rest(self.data, cluster_res_key, filter_raw=False)
1127 mean_count_in_cluster = cell_cluster_to_gene_exp_cluster(self.data, cluster_res_key, kind='mean', filter_raw=False)
-> 1129 tool = FindMarker(data=data, groups=self.result[cluster_res_key], method=method, case_groups=case_groups,
1130 control_groups=control_groups, corr_method=corr_method, sort_by=sort_by,
1131 n_genes=n_genes, ascending=ascending, n_jobs=n_jobs, pct=pct, pct_rest=pct_rest, mean_count=mean_count_in_cluster)
1132 result = tool.result
1133 result['parameters'] = {
1134 'cluster_res_key': cluster_res_key,
1135 'method': method,
(...)
1138 'use_raw': use_raw
1139 }
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/tools/find_markers.py:82, in FindMarker.init(self, data, groups, method, case_groups, control_groups, corr_method, tie_term, sort_by, n_genes, ascending, n_jobs, pct, pct_rest, mean_count)
80 self.result['pct_rest'] = pct_rest
81 self.result['mean_count'] = mean_count
---> 82 self.fit()
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/core/tool_base.py:152, in ToolBase.fit_log..wrapper(*args, **kwargs)
149 @functools.wraps(func)
150 def wrapper(*args, **kwargs):
151 logger.info('start to run...')
--> 152 func(*args, **kwargs)
153 logger.info('end to run.')
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/stereo/tools/find_markers.py:238, in FindMarker.fit(self)
236 self.len_case_groups = len(case_groups)
237 n_jobs = min(cpu_count(), self.n_jobs)
--> 238 Parallel(n_jobs=n_jobs, backend='threading')(
239 delayed(self.handle_result)(g, group_info, all_groups, ranks, tie_term, control_str)
240 for g in case_groups
241 )
242 del self.temp_logres_score
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:2007, in Parallel.call(self, iterable)
2001 # The first item from the output is blank, but it makes the interpreter
2002 # progress until it enters the Try/Except block of the generator and
2003 # reaches the first
yield
statement. This starts the asynchronous2004 # dispatch of the tasks to the workers.
2005 next(output)
-> 2007 return output if self.return_generator else list(output)
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch)
1647 yield
1649 with self._backend.retrieval_context():
-> 1650 yield from self._retrieve()
1652 except GeneratorExit:
1653 # The generator has been garbage collected before being fully
1654 # consumed. This aborts the remaining tasks if possible and warn
1655 # the user if necessary.
1656 self._exception = True
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:1754, in Parallel._retrieve(self)
1747 while self._wait_retrieval():
1748
1749 # If the callback thread of a worker has signaled that its task
1750 # triggered an exception, or if the retrieval loop has raised an
1751 # exception (e.g.
GeneratorExit
), exit the loop and surface the1752 # worker traceback.
1753 if self._aborting:
-> 1754 self._raise_error_fast()
1755 break
1757 # If the next job is not ready for retrieval yet, we just wait for
1758 # async callbacks to progress.
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:1789, in Parallel._raise_error_fast(self)
1785 # If this error job exists, immediately raise the error by
1786 # calling get_result. This job might not exists if abort has been
1787 # called directly or if the generator is gc'ed.
1788 if error_job is not None:
-> 1789 error_job.get_result(self.timeout)
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout)
739 backend = self.parallel._backend
741 if backend.supports_retrieve_callback:
742 # We assume that the result has already been retrieved by the
743 # callback thread, and is stored internally. It's just waiting to
744 # be returned.
--> 745 return self._return_or_raise()
747 # For other backends, the main thread needs to run the retrieval step.
748 try:
File ~/.conda/envs/stereopy/lib/python3.8/site-packages/joblib/parallel.py:763, in BatchCompletionCallBack._return_or_raise(self)
761 try:
762 if self.status == TASK_ERROR:
--> 763 raise self._result
764 return self._result
765 finally:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
It seems like the input file is not recognized as a numeric value, thus cannot do a Wilcoxon test. How I can do such DEG analysis in a correct way? The comparative analysis tutorial on Stereopy (https://stereopy.readthedocs.io/en/latest/Tutorials%28Multi-sample%29/Comparative_Analysis.html) showed a cell co-occurance analysis and cell community detection first , but not straight forward analysis to get the differential markers between multiple control and case groups.
Beta Was this translation helpful? Give feedback.
All reactions