Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation #79

Closed
shayandavoodii opened this issue Mar 5, 2024 · 7 comments
Closed

Improve documentation #79

shayandavoodii opened this issue Mar 5, 2024 · 7 comments

Comments

@shayandavoodii
Copy link

shayandavoodii commented Mar 5, 2024

In the Lasso.md, a method of fit is introduced as follows:

fit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear (concave) gamma lasso path given the design matrix X and response y.

Is it possible to provide an example in the documentation or mention the acceptable shape of X and y? I.e., X should be in size of $n\times m$, and y should be a vector of length $m$. I have a problem using the method since I don't know what is the acceptable size of these two arguments. I believe they should have something in common for example the length of y should be equal to the nrows(X) or ncols(X).

P.S.: In my case study, I have a X of size $d\times w$, and a y of length $d$. I don't know if I should pass X or X' as the second argument. I expect to get a matrix of coefficients of size $n\times d$ or $d\times n$.

@gdalle
Copy link

gdalle commented Mar 6, 2024

In the meantime maybe try both and see which one fails due to a shape mismatch?

@shayandavoodii
Copy link
Author

shayandavoodii commented Mar 6, 2024

In the meantime maybe try both and see which one fails due to a shape mismatch?

Surely I tried. But the result is not aligned with my expectation:

julia> using Lasso

julia> x = rand(5, 1); y = rand(5);

julia> m = fit(GammaLassoPath, x, y);

julia> coef(m)
2×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 100 stored entries:
⎡⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⎤
⎣⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⎦

I expect a 5×50 or 50×5 sparse matrix in this case! This is why I asked for further elaboration on the size of X and y in the docs.

@gdalle
Copy link

gdalle commented Mar 6, 2024

Classically for regression purposes, if you have n samples of dimension d each, X is expected to be a matrix with n rows and d columns, while y is expected to be a vector of length n.
From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda. Each one gives rise to 2 coefficients, one for the only feature in X (cause d = 1) and one for the intercept. Does that help?

@shayandavoodii
Copy link
Author

shayandavoodii commented Mar 6, 2024

From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda. Each one gives rise to 2 coefficients, one for the only feature in X (cause d = 1) and one for the intercept.

I think I got my answer. So, in my example, I should use x = rand(1, 5) and y=[rand()], because I have one sample with five features. Then:

julia> m = fit(GammaLassoPath, x, y)
┌ Warning: One of the predicators (columns of X) is a constant, so it can not be standardized.
│ To include a constant predicator set standardize = false and intercept = false

So, I should follow the instructions:

julia> m = fit(GammaLassoPath, x, y, standardize=false, intercept=false);

julia> coef(m)
5×76 SparseArrays.SparseMatrixCSC{Float64, Int64} with 75 stored entries:
⎡⠠⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⎤
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎦

That is aligned with what I expect. Now, how can I choose one of the coefficient series in the returned sparse matrix?

julia> coef(m)[:, 1]
5-element SparseArrays.SparseVector{Float64, Int64} with 0 stored entries

julia> coef(m)[:, 2]
5-element SparseArrays.SparseVector{Float64, Int64} with 1 stored entry:
  [2]  =  0.0231384

I expect a vector of length 5 in each. However, it returns a scalar with weird indexing.

@shayandavoodii
Copy link
Author

I think I got it:

julia> coef(m) |> Matrix
5×76 Matrix{Float64}:
 0.0  0.0        0.0        0.0          0.0       0.0       0.0
 0.0  0.0231384  0.0452252  0.0663081     0.492017  0.492793  0.493533       
 0.0  0.0        0.0        0.0           0.0       0.0       0.0
 0.0  0.0        0.0        0.0           0.0       0.0       0.0
 0.0  0.0        0.0        0.0           0.0       0.0       0.0

@gdalle
Copy link

gdalle commented Mar 6, 2024

I expect a vector of length 5 in each. However, it returns a scalar with weird indexing.

This is not a scalar, it is a sparse vector with only one nonzero entry. The reason for this behavior is that Lasso parameters are meant to be sparse, aka have few nonzero entries

@shayandavoodii
Copy link
Author

Thank you. It seems that I reached the answer to my question. Thank you for your help and elaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants