Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on using 'groupby' covariate with 'stochastic' mode within scv.tl.velocity() #874

Open
JesseRop opened this issue May 4, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@JesseRop
Copy link

JesseRop commented May 4, 2022

Hi @WeilerP , thanks for the great package and all your efforts in maintaining it!

I was able to replicate the error above as in the scripts below.
I was wondering whether there's is a way to calculate RNA velocities and velocity streams independently per cluster (or batch) in the stochastic/deterministic mode so to allow the accounting of different dynamics per cluster (or batch). I'm aware of recalculating velocities while accounting for differential kinetics per cluster in the dynamical mode but the mode doesn't work well for my data.
Do I have to subset the different clusters into separate anndata objects and run RNA velocities and then merge the objects back together?
I was thinking 'groupby' can be used to condition for the different clusters but it seems it has to be used in combination with 'group' to restrict to RNA velocity calculation to just one(or more) cluster rather than perform RNA velocity for all cell conditioning for each cluster.

adata = scv.datasets.dentategyrus()

scv.pp.filter_and_normalize(adata, min_shared_counts=30, n_top_genes=2000)

scv.pp.moments(adata, n_pcs=30, n_neighbors=30)

scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters')
Error output
computing velocities

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Input In [408], in <cell line: 1>()
----> 1 scv.tl.velocity(adata, mode='stochastic', min_r2=0.1, groupby='clusters')

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:428, in velocity(data, vkey, mode, fit_offset, fit_offset2, filter_genes, groups, groupby, groups_for_fit, constrain_ratio, use_raw, use_latent_time, perc, min_r2, min_likelihood, r2_adjusted, use_highly_variable, diff_kinetics, copy, **kwargs)
    419         _adata = adata if groups is None else adata[cell_subset]
    420         velo = Velocity(
    421             _adata,
    422             residual=residual,
   (...)
    426             use_highly_variable=use_highly_variable,
    427         )
--> 428     velo.compute_stochastic(fit_offset, fit_offset2, mode, perc=perc)
    430 write_residuals(adata, vkey, velo._residual, cell_subset)
    431 write_residuals(adata, f"variance_{vkey}", velo._residual2, cell_subset)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/velocity.py:147, in Velocity.compute_stochastic(self, fit_offset, fit_offset2, mode, perc)
    141 res2_std = (cov_us - _gamma2 * var_ss - _offset2).std(0)
    143 # solve multiple regression
    144 self._offset[idx], self._offset2[idx], self._gamma[idx] = (
    145     maximum_likelihood(_Ms, _Mu, _Mus, _Mss, fit_offset, fit_offset2)
    146     if mode == "bayes"
--> 147     else leastsq_generalized(
    148         _Ms,
    149         _Mu,
    150         var_ss,
    151         cov_us,
    152         res_std,
    153         res2_std,
    154         fit_offset,
    155         fit_offset2,
    156         perc,
    157     )
    158 )
    160 self._residual = self._Mu - self._gamma * self._Ms
    161 if fit_offset:

File ~/envs/scv_jupcon/lib/python3.9/site-packages/scvelo/tools/optimization.py:173, in leastsq_generalized(x, y, x2, y2, res_std, res2_std, fit_offset, fit_offset2, perc)
    171     for i in range(n_var):
    172         A = np.c_[x[:, i]]
--> 173         gamma[i] = np.linalg.pinv(A.T.dot(A)).dot(A.T.dot(y[:, i]))
    175 offset[np.isnan(offset)] = 0
    176 offset_ss[np.isnan(offset_ss)] = 0

File <__array_function__ internals>:5, in pinv(*args, **kwargs)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:2002, in pinv(a, rcond, hermitian)
   2000     return wrap(res)
   2001 a = a.conjugate()
-> 2002 u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
   2004 # discard small singular values
   2005 cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)

File <__array_function__ internals>:5, in svd(*args, **kwargs)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:1660, in svd(a, full_matrices, compute_uv, hermitian)
   1657         gufunc = _umath_linalg.svd_n_s
   1659 signature = 'D->DdD' if isComplexType(t) else 'd->ddd'
-> 1660 u, s, vh = gufunc(a, signature=signature, extobj=extobj)
   1661 u = u.astype(result_t, copy=False)
   1662 s = s.astype(_realType(result_t), copy=False)

File ~/envs/scv_jupcon/lib/python3.9/site-packages/numpy/linalg/linalg.py:97, in _raise_linalgerror_svd_nonconvergence(err, flag)
     96 def _raise_linalgerror_svd_nonconvergence(err, flag):
---> 97     raise LinAlgError("SVD did not converge")

LinAlgError: SVD did not converge
Versions
scvelo==0.2.5.dev71+g85295fc  scanpy==1.9.1  anndata==0.7.8  loompy==3.0.7  numpy==1.21.5  scipy==1.8.0  matplotlib==3.5.1  sklearn==1.0.2  pandas==1.4.2  

Numba: Attempted to fork from a non-main thread, the TBB library may be in an invalid state in the child process.
ERROR: XMLRPC request failed [code: -32500]
RuntimeError: PyPI's XMLRPC API is currently disabled due to unmanageable load and will be deprecated in the near future. See https://status.python.org/ for more information.
@WeilerP
Copy link
Member

WeilerP commented May 4, 2022

@JesseRop, thanks for reporting this. Looks similar to #856. I'll check if I can reproduce the error on my side and get back to you ASAP.
In general: If the dynamical mode does not work on your data, you shouldn't simply switch to another (less sophisticated) one. Both approaches rely on a specific shape of phase portraits and cell types aligned along it. See e.g. #216 or #462 for possible reasons for failure.

@WeilerP
Copy link
Member

WeilerP commented May 4, 2022

@JesseRop, the numpy version is the issue. Not sure why, though. When downgrading to numpy<1.21, the error is no longer thrown.
@michalk8, @ivirshup did you encounter something similar before by chance, or are aware of anything introduced in numpy>1.21 causing this?

@monica-nicolau
Copy link

monica-nicolau commented May 24, 2023

Hi everyone!
@WeilerP are you sure the numpy version is the only issue? Because I've been running scv.tl.velocity for multiple (5 other) samples and it raises this error only for one sample, how could it be working for the other samples if the numpy version is the same?
It's not a memory issue either, because I tried running the command different times and it always raises the same error. Could you help me understand what it is the problem of this object?

My current np.__version__ is '1.21.1'

UPDATE ***
I managed to avoid the error by specifying enforce=True during the normalization process carried out through this function: scv.pp.filter_and_normalize()
After enforce=True, the scv.tl.velocity() function didn't raise anymore the "SVD did not converge" error :)

@h4rvey-g
Copy link

Another potential cause of this error if you are using adata exported from Seurat. Ensure that the primary layer exported to adata is the counts layer of the Seurat object, which contains the raw counts. Then, execute scv.pp.filter_and_normalize() on this layer. Avoid using the data layer from Seurat and run scv.pp.filter_and_normalize() again, as it is already the log-normalized version of the counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants