Improve documentation #79

shayandavoodii · 2024-03-05T18:25:09Z

In the Lasso.md, a method of fit is introduced as follows:

fit(GammaLassoPath, X, y, d=Normal(), l=canonicallink(d); ...)
fits a linear or generalized linear (concave) gamma lasso path given the design matrix X and response y.

Is it possible to provide an example in the documentation or mention the acceptable shape of X and y? I.e., X should be in size of $n\times m$, and y should be a vector of length $m$. I have a problem using the method since I don't know what is the acceptable size of these two arguments. I believe they should have something in common for example the length of y should be equal to the nrows(X) or ncols(X).

P.S.: In my case study, I have a X of size $d\times w$, and a y of length $d$. I don't know if I should pass X or X' as the second argument. I expect to get a matrix of coefficients of size $n\times d$ or $d\times n$.

The text was updated successfully, but these errors were encountered:

gdalle · 2024-03-06T08:00:13Z

In the meantime maybe try both and see which one fails due to a shape mismatch?

shayandavoodii · 2024-03-06T08:05:27Z

In the meantime maybe try both and see which one fails due to a shape mismatch?

Surely I tried. But the result is not aligned with my expectation:

julia> using Lasso

julia> x = rand(5, 1); y = rand(5);

julia> m = fit(GammaLassoPath, x, y);

julia> coef(m)
2×50 SparseArrays.SparseMatrixCSC{Float64, Int64} with 100 stored entries:
⎡⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⠉⎤
⎣⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⎦

I expect a 5×50 or 50×5 sparse matrix in this case! This is why I asked for further elaboration on the size of X and y in the docs.

gdalle · 2024-03-06T08:45:18Z

Classically for regression purposes, if you have n samples of dimension d each, X is expected to be a matrix with n rows and d columns, while y is expected to be a vector of length n.
From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda. Each one gives rise to 2 coefficients, one for the only feature in X (cause d = 1) and one for the intercept. Does that help?

shayandavoodii · 2024-03-06T14:55:52Z

From what I understand, the coefficients of your lasso path here correspond to 50 different values of the regularization lambda. Each one gives rise to 2 coefficients, one for the only feature in X (cause d = 1) and one for the intercept.

I think I got my answer. So, in my example, I should use x = rand(1, 5) and y=[rand()], because I have one sample with five features. Then:

julia> m = fit(GammaLassoPath, x, y)
┌ Warning: One of the predicators (columns of X) is a constant, so it can not be standardized.
│ To include a constant predicator set standardize = false and intercept = false

So, I should follow the instructions:

julia> m = fit(GammaLassoPath, x, y, standardize=false, intercept=false);

julia> coef(m)
5×76 SparseArrays.SparseMatrixCSC{Float64, Int64} with 75 stored entries:
⎡⠠⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⠤⎤
⎣⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⎦

That is aligned with what I expect. Now, how can I choose one of the coefficient series in the returned sparse matrix?

julia> coef(m)[:, 1]
5-element SparseArrays.SparseVector{Float64, Int64} with 0 stored entries

julia> coef(m)[:, 2]
5-element SparseArrays.SparseVector{Float64, Int64} with 1 stored entry:
  [2]  =  0.0231384

I expect a vector of length 5 in each. However, it returns a scalar with weird indexing.

shayandavoodii · 2024-03-06T15:04:04Z

I think I got it:

julia> coef(m) |> Matrix
5×76 Matrix{Float64}:
 0.0  0.0        0.0        0.0        …  0.0       0.0       0.0
 0.0  0.0231384  0.0452252  0.0663081     0.492017  0.492793  0.493533       
 0.0  0.0        0.0        0.0           0.0       0.0       0.0
 0.0  0.0        0.0        0.0           0.0       0.0       0.0
 0.0  0.0        0.0        0.0           0.0       0.0       0.0

gdalle · 2024-03-06T15:13:22Z

I expect a vector of length 5 in each. However, it returns a scalar with weird indexing.

This is not a scalar, it is a sparse vector with only one nonzero entry. The reason for this behavior is that Lasso parameters are meant to be sparse, aka have few nonzero entries

shayandavoodii · 2024-03-06T15:15:58Z

Thank you. It seems that I reached the answer to my question. Thank you for your help and elaboration.

shayandavoodii closed this as completed Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation #79

Improve documentation #79

shayandavoodii commented Mar 5, 2024 •

edited

Loading

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024 •

edited

Loading

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024 •

edited

Loading

shayandavoodii commented Mar 6, 2024

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024

Improve documentation #79

Improve documentation #79

Comments

shayandavoodii commented Mar 5, 2024 • edited Loading

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024 • edited Loading

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024 • edited Loading

shayandavoodii commented Mar 6, 2024

gdalle commented Mar 6, 2024

shayandavoodii commented Mar 6, 2024

shayandavoodii commented Mar 5, 2024 •

edited

Loading

shayandavoodii commented Mar 6, 2024 •

edited

Loading

shayandavoodii commented Mar 6, 2024 •

edited

Loading