Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration testing for LOBPCG #11

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Integration testing for LOBPCG #11

wants to merge 9 commits into from

Conversation

YuhanLiin
Copy link
Collaborator

I saw some interesting failures while property testing LOBPCG. The failure cases are covered in the tests problematic_eig_matrix and problematic_svd_matrix. The eig test straight-up yields the wrong eigenvalues and the SVD test yields the correct singular values but without the last value. @bytesnake any ideas on what's wrong?

@YuhanLiin YuhanLiin requested a review from bytesnake May 25, 2022 05:41
@YuhanLiin YuhanLiin marked this pull request as draft May 25, 2022 05:41
@bytesnake
Copy link
Member

Hi YuhanLii, LOBPCG should not be used for full decompositions, the python source bails out when you want more than 1/5 of the eigenvectors/vals. https://github.com/scipy/scipy/blob/v1.8.1/scipy/sparse/linalg/_eigen/lobpcg/lobpcg.py#L339 I think it should be relatively easy to find problems where this fails. The missing eigenvalue should not occur though and requires more debugging. Don't have time today, but can give it a shot tomorrow

@YuhanLiin
Copy link
Collaborator Author

Does the algorithm also fail if you compute the full decomp one eigenvalue at a time using the iterator? Does it also apply to SVD (meaning that you can't use it to compute full SVD)? Also, we should add the check from the Python code to our impl as well, and return an error if the user requests too many eigvals.

For the missing value, I think it may be because the rank of the matrix is lower than the number of eigvals requested, in which case the behaviour is correct.

@bytesnake
Copy link
Member

For the missing value, I think it may be because the rank of the matrix is lower than the number of eigvals requested, in which case the behaviour is correct.

yes that would be an explanation

Does the algorithm also fail if you compute the full decomp one eigenvalue at a time using the iterator? Does it also apply to SVD (meaning that you can't use it to compute full SVD)? Also, we should add the check from the Python code to our impl as well, and return an error if the user requests too many eigvals.

yes it applies to any solver derived from LOBPCG, I would add brakes to SVD and eigenvalue decomposition. The question is how many, I observed in application that 1/5 is a conservative bound but for most usecases sufficient

@YuhanLiin
Copy link
Collaborator Author

YuhanLiin commented May 25, 2022

Actually it can't be due to the rank because none of the single values of the input matrix in problematic_svd_matrix are 0

Also, the Marchenko-Pastur test uses a 1000x500 matrix but obtains 500 singular values. Doesn't that violate the 1/5 rule?

@lobpcg
Copy link

lobpcg commented Jun 29, 2022

Actually it can't be due to the rank because none of the single values of the input matrix in problematic_svd_matrix are 0

Also, the Marchenko-Pastur test uses a 1000x500 matrix but obtains 500 singular values. Doesn't that violate the 1/5 rule?

Yes, of course it does.

The 1/5 rule is there not only to prevent lobpcg possible failures, but also to enforce using a standard dense solver in situations where it is likely much more efficient compared to lobpcg.

@bytesnake
Copy link
Member

sorry for the long silence, just to add an explanation: I have never seen any failure for the Marchenko-Pastur distributed random matrices and therefore kept comparing the whole spectrum instead of first fifth.

@YuhanLiin
Copy link
Collaborator Author

Then maybe we can have the internal algorithm solve the whole spectrum for testing but have the public API limit it to 1/5th. Or just change the Marchenko-Pastur test to do 1/5th so it runs faster.

@bytesnake
Copy link
Member

Then maybe we can have the internal algorithm solve the whole spectrum for testing but have the public API limit it to 1/5th. Or just change the Marchenko-Pastur test to do 1/5th so it runs faster.

here is a proposal a780910 add branch for dense eigenproblem solver for TruncatedSvd and TruncatedEig

@YuhanLiin
Copy link
Collaborator Author

@bytesnake I've implemented your change and it works on the previous failing cases. However, I'm getting another failing matrix on the truncated eigenvalues, which I've put into the problematic_eig test. The issue is that the eigenvector is wrong, even when extracting just 1 eigenvalue.

@YuhanLiin
Copy link
Collaborator Author

Update: When I run the linfa PCA tests against this branch, I get the following failure:

thread 'pca::tests::test_explained_variance_diag' panicked at 'ndarray: inputs 4 × 4 and 3 × 3 are not compatible for matrix multiplication', /home/yuhan/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.4/src/linalg/impl_linalg.rs:299:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/panicking.rs:116:14
   2: ndarray::linalg::impl_linalg::dot_shape_error
             at /home/yuhan/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.4/src/linalg/impl_linalg.rs:299:5
   3: <ndarray::ArrayBase<S,ndarray::dimension::dim::Dim<[usize; 2]>> as ndarray::linalg::impl_linalg::Dot<ndarray::ArrayBase<S2,ndarray::dimension::dim::Dim<[usize; 2]>>>>::dot
             at /home/yuhan/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.4/src/linalg/impl_linalg.rs:273:13
   4: ndarray::linalg::impl_linalg::<impl ndarray::ArrayBase<S,ndarray::dimension::dim::Dim<[usize; 2]>>>::dot
             at /home/yuhan/.cargo/registry/src/github.com-1ecc6299db9ec823/ndarray-0.15.4/src/linalg/impl_linalg.rs:257:9
   5: linfa_linalg::lobpcg::svd::TruncatedSvdResult<A>::values_vectors
             at /home/yuhan/projects/rust/ml/linfa-linalg/src/lobpcg/svd.rs:89:30
   6: <linfa_reduction::pca::PcaParams<f32> as linfa::traits::Fit<ndarray::ArrayBase<D,ndarray::dimension::dim::Dim<[usize; 2]>>,T,linfa_reduction::error::ReductionError>>::fit
             at ./src/pca.rs:138:43
   7: linfa_reduction::pca::tests::test_explained_variance_diag
             at ./src/pca.rs:390:21
   8: linfa_reduction::pca::tests::test_explained_variance_diag::{{closure}}
             at ./src/pca.rs:388:5
   9: core::ops::function::FnOnce::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5
  10: core::ops::function::FnOnce::call_once
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/core/src/ops/function.rs:227:5

@YuhanLiin
Copy link
Collaborator Author

@bytesnake any updates?

@bytesnake bytesnake self-assigned this Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants