add Penalties and Regularized Least Squares

format use :cholesky and dropcollinear like in GLM correctly use QR test method :cholesky change types clean after rebase
getzze · Sep 26, 2024 · d3207a9 · d3207a9
1 parent af96ef1
commit d3207a9
Show file tree

Hide file tree

Showing 17 changed files with 4,124 additions and 21 deletions.
diff --git a/README.md b/README.md
@@ -40,6 +40,8 @@ This package implements:
 * MQuantile regression (e.g. Expectile regression)
 * Robust Ridge regression (using any of the previous estimator)
 * Quantile regression using interior point method
+* Regularized Least Square regression
+* Θ-IPOD regression, possibly with a penalty term
 
 ## Installation
 
@@ -55,14 +57,23 @@ julia>] add RobustModels#main
 
 ## Usage
 
-The prefered way of performing robust regression is by calling the `rlm` function:
+The preferred way of performing robust regression is by calling the `rlm` function:
 
 `m = rlm(X, y, MEstimator{TukeyLoss}(); initial_scale=:mad)`
 
 For quantile regression, use `quantreg`:
 
 `m = quantreg(X, y; quantile=0.5)`
 
+For Regularized Least Squares and a penalty term, use `rlm`:
+
+`m = rlm(X, y, L1Penalty(); method=:cgd)`
+
+For Θ-IPOD regression with outlier detection and a penalty term, use `ipod`:
+
+`m = ipod(X, y, L2Loss(), SquaredL2Penalty(); method=:auto)`
+
+
 For robust version of `mean`, `std`, `var` and `sem` statistics, specify the estimator as first argument.
 Use the `dims` keyword for computing the statistics along specific dimensions.
 The following functions are also implemented: `mean_and_std`, `mean_and_var` and `mean_and_sem`.
@@ -119,6 +130,16 @@ m9 = rlm(form, data, MEstimator{TukeyLoss}(); initial_scale=:L1, ridgeλ=1.0)
 ## Quantile regression solved by linear programming interior point method
 m10 = quantreg(form, data; quantile=0.2)
 refit!(m10; quantile=0.8)
+
+## Penalized regression
+m11 = rlm(form, data, SquaredL2Penalty(); method=:auto)
+
+## Θ-IPOD regression with outlier detection
+m12 = ipod(form, data, TukeyLoss(); method=:auto)
+
+## Θ-IPOD regression with outlier detection and a penalty term
+m13 = ipod(form, data, L2Loss(), L1Penalty(); method=:ama)
+
 ;
 
 # output
@@ -136,7 +157,7 @@ With ordinary least square (OLS), the objective function is, from maximum likeli
 where `yᵢ` are the values of the response variable, `𝒙ᵢ` are the covectors of individual covariates
 (rows of the model matrix `X`), `𝜷` is the vector of fitted coefficients and `rᵢ` are the individual residuals.
 
-A `RobustLinearModel` solves instead for the following loss function: `L' = Σᵢ ρ(rᵢ)`
+A `RobustLinearModel` solves instead for the following loss function [1]: `L' = Σᵢ ρ(rᵢ)`
 (more precisely `L' = Σᵢ ρ(rᵢ/σ)` where `σ` is an (robust) estimate of the standard deviation of the residual).
 Several loss functions are implemented:
 
@@ -150,7 +171,7 @@ Several loss functions are implemented:
 - `CauchyLoss`: `ρ(r) = log(1+(r/c)²)`, non-convex estimator, that also corresponds to a Student's-t distribution (with fixed degree of freedom). It suppresses outliers more strongly but it is not sure to converge.
 - `GemanLoss`: `ρ(r) = ½ (r/c)²/(1 + (r/c)²)`, non-convex and bounded estimator, it suppresses outliers more strongly.
 - `WelschLoss`: `ρ(r) = ½ (1 - exp(-(r/c)²))`, non-convex and bounded estimator, it suppresses outliers more strongly.
-- `TukeyLoss`: `ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end`, non-convex and bounded estimator, it suppresses outliers more strongly and it is the prefered estimator for most cases.
+- `TukeyLoss`: `ρ(r) = if r<c; ⅙(1 - (1-(r/c)²)³) else ⅙ end`, non-convex and bounded estimator, it suppresses outliers more strongly and it is the preferred estimator for most cases.
 - `YohaiZamarLoss`: `ρ(r)` is quadratic for `r/c < 2/3` and is bounded to 1; non-convex estimator, it is optimized to have the lowest bias for a given efficiency.
 
 The value of the tuning constants `c` are optimized for each estimator so the M-estimators have a high efficiency of 0.95. However, these estimators have a low breakdown point.
@@ -178,9 +199,20 @@ the two loss functions should be the same but with different tuning constants.
 ### MQuantile-estimators
 
 Using an asymmetric variant of the `L1Estimator`, quantile regression is performed
-(although the `QuantileRegression` solver should be prefered because it gives an exact solution).
-Identically, with an M-estimator using an asymetric version of the loss function,
-a generalization of quantiles is obtained. For instance, using an asymetric `L2Loss` results in _Expectile Regression_.
+(although the `QuantileRegression` solver should be preferred because it gives an exact solution).
+Identically, with an M-estimator using an asymmetric version of the loss function,
+a generalization of quantiles is obtained. For instance, using an asymmetric `L2Loss` results in _Expectile Regression_.
+
+### Quantile regression
+
+_Quantile regression_ results from minimizing the following objective function:
+`L = Σᵢ wᵢ|yᵢ - 𝒙ᵢ 𝜷| = Σᵢ wᵢ(rᵢ) |rᵢ|`,
+where `wᵢ = ifelse(rᵢ>0, τ, 1-τ)` and `τ` is the quantile of interest. `τ=0.5` corresponds to _Least Absolute Deviations_.
+
+This problem is solved exactly using linear programming techniques,
+specifically, interior point methods using the internal API of [Tulip](https://github.com/ds4dm/Tulip.jl).
+The internal API is considered unstable, but it results in a much lighter dependency than
+including the [JuMP](https://github.com/JuliaOpt/JuMP.jl) package with Tulip backend.
 
 ### Robust Ridge regression
 
@@ -192,27 +224,60 @@ By default, all the coefficients (except the intercept) have the same penalty, w
 all the feature variables have the same scale. If it is not the case, use a robust estimate of scale
 to normalize every column of the model matrix `X` before fitting the regression.
 
-### Quantile regression
+### Regularized Least Squares
 
-_Quantile regression_ results from minimizing the following objective function:
-`L = Σᵢ wᵢ|yᵢ - 𝒙ᵢ 𝜷| = Σᵢ wᵢ(rᵢ) |rᵢ|`,
-where `wᵢ = ifelse(rᵢ>0, τ, 1-τ)` and `τ` is the quantile of interest. `τ=0.5` corresponds to _Least Absolute Deviations_.
+_Regularized Least Squares_ regression is the solution of the minimization of following objective function:
+`L = ½ Σᵢ |yᵢ - 𝒙ᵢ 𝜷|² + P(𝜷)`,
+where `P(𝜷)` is a (sparse) penalty on the coefficients.
+
+The following penalty function are defined:
+    - `NoPenalty`: `cost(𝐱) = 0`, no penalty.
+    - `SquaredL2Penalty`: `cost(𝐱) = λ ½||𝐱||₂²`, also called Ridge.
+    - `L1Penalty`: `cost(𝐱) = λ||𝐱||₁`, also called LASSO.
+    - `ElasticNetPenalty`: `cost(𝐱) = λ . l1_ratio .||𝐱||₁ + λ .(1 - l1_ratio) . ½||𝐱||₂²`.
+    - `EuclideanPenalty`: `cost(𝐱) = λ||𝐱||₂`, non-separable penalty used in Group LASSO.
+
+Different penalties can be applied to different indices of the coefficients, using
+`RangedPenalties(ranges, penalties)`. E.g., `RangedPenalties([2:5], [L1Penalty(1.0)])` defines
+a L1 penalty on every coefficients except the first index, which can correspond to the intercept.
+
+With a penalty, the following solvers are available (instead of the other ones):
+    - `:cgd`, Coordinate Gradient Descent [2].
+    - `:fista`, Fast Iterative Shrinkage-Thresholding Algorithm [3].
+    - `:ama`, Alternating Minimization Algorithm [4].
+    - `:admm`, Alternating Direction Method of Multipliers [5].
+
+To use a robust loss function with a penalty, see Θ-IPOD regression.
+
+### Θ-IPOD regression
+
+_Θ-IPOD regression_ (Θ-thresholding based Iterative Procedure for Outlier Detection) results from
+minimizing the following objective function [6]:
+`L = ½ Σᵢ |yᵢ - 𝒙ᵢ 𝜷 - γᵢ|² + P(𝜷) + Q(γ)`,
+where `Q(γ)` is a penalty function on the outliers `γ` that is sparse so the problem is not underdetermined.
+We don't need to know the expression of this penalty function, just that it leads to thresholding using
+one of the loss function used by M-Estimators. Then Θ-IPOD is equivalent to solving an M-Estimator.
+This problem is solved using an alternating minimization technique, for the outlier detection.
+Without penalty, the coefficients are updated at every step using a solver for _Ordinary Least Square_.
+
+`P(𝜷)` is an optional (sparse) penalty on the coefficients.
 
-This problem is solved exactly using linear programming techniques,
-specifically, interior point methods using the internal API of [Tulip](https://github.com/ds4dm/Tulip.jl).
-The internal API is considered unstable, but it results in a much lighter dependency than
-including the [JuMP](https://github.com/JuliaOpt/JuMP.jl) package with Tulip backend.
 
 ## Credits
 
 This package derives from the [RobustLeastSquares](https://github.com/FugroRoames/RobustLeastSquares.jl)
 package for the initial implementation, especially for the Conjugate Gradient
 solver and the definition of the M-Estimator functions.
 
-Credits to the developpers of the [GLM](https://github.com/JuliaStats/GLM.jl)
+Credits to the developers of the [GLM](https://github.com/JuliaStats/GLM.jl)
 and [MixedModels](https://github.com/JuliaStats/MixedModels.jl) packages
 for implementing the Iteratively Reweighted Least Square algorithm.
 
 ## References
 
-- "Robust Statistics: Theory and Methods (with R)", 2nd Edition, 2019, R. Maronna, R. Martin, V. Yohai, M. Salibián-Barrera
+[1] "Robust Statistics: Theory and Methods (with R)", 2nd Edition, 2019, R. Maronna, R. Martin, V. Yohai, M. Salibián-Barrera
+[2] "Regularization Paths for Generalized Linear Models via Coordinate Descent", 2009, J. Friedman, T. Hastie, R. Tibshirani
+[3] "A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems", 2009, A. Beck, M. Teboulle
+[4] "Applications of a splitting algorithm to decomposition in convex programming and variational inequalities", 1991, P. Tseng
+[5] "Fast Alternating Direction Optimization Methods", 2014, T. Goldstein, B. O'Donoghue, S. Setzer, R. Baraniuk
+[6] "Outlier Detection Using Nonconvex Penalized Regression", 2011, Y. She, A.B. Owen
diff --git a/_typos.toml b/_typos.toml
@@ -8,3 +8,4 @@ Artic = "Artic"
 
 [default.extend-words]
 qest = "qest"
+wrk = "wrk"
diff --git a/docs/src/api.md b/docs/src/api.md
@@ -9,6 +9,7 @@ AbstractQuantileEstimator
 LossFunction
 RobustLinearModel
 RobustModels.RobustLinResp
+RobustModels.IPODResp
 GLM.LinPred
 RobustModels.DensePredCG
 RobustModels.SparsePredCG
@@ -17,7 +18,12 @@ GLM.SparsePredChol
 GLM.DensePredQR
 RobustModels.RidgePred
 RobustModels.AbstractRegularizedPred
+RobustModels.CGDPred
+RobustModels.FISTAPred
+RobustModels.AMAPred
+RobustModels.ADMMPred
 QuantileRegression
+IPODRegression
 ```
 
 ## Constructors for models
@@ -30,6 +36,7 @@ fit(::Type{M}, ::Union{AbstractMatrix{T}}, ::AbstractVector{T}) where {T<:Abstra
 ```@docs
 rlm
 quantreg
+ipod
 fit!
 refit!
 ```
@@ -67,10 +74,13 @@ StatsAPI.residuals
 StatsModels.hasintercept
 hasformula
 formula
+haspenalty
+penalty
 scale
 tauscale
 RobustModels.location_variance
 Estimator
+outliers
 GLM.linpred!
 RobustModels.pirls!
 RobustModels.pirls_Sestimate!
@@ -111,7 +121,17 @@ HardThresholdLoss
 HampelLoss
 ```
 
-## Estimator and Loss functions methods
+## Penalty functions
+```@docs
+NoPenalty
+SquaredL2Penalty
+EuclideanPenalty
+L1Penalty
+ElasticNetPenalty
+RangedPenalties
+```
+
+## Estimator, Loss and Penalty functions methods
 ```@docs
 RobustModels.rho
 RobustModels.psi
@@ -139,4 +159,10 @@ RobustModels.set_MEstimator
 RobustModels.update_weight!
 RobustModels.tau_scale_estimate
 RobustModels.quantile_weight
+RobustModels.cost
+RobustModels.proximal!
+RobustModels.proximal
+RobustModels.isconcrete
+RobustModels.concrete!
+RobustModels.concrete
 ```
diff --git a/src/RobustModels.jl b/src/RobustModels.jl
@@ -173,6 +173,18 @@ export LossFunction,
     GeneralizedQuantileEstimator,
     ExpectileEstimator,
     L2Estimator,
+    PenaltyFunction,
+    NoPenalty,
+    SquaredL2Penalty,
+    L1Penalty,
+    ElasticNetPenalty,
+    EuclideanPenalty,
+    BerhuPenalty,
+    CappedL1Penalty,
+    SCADPenalty,
+    MCPPenalty,
+    RangedPenalties,
+    End,
     DensePredCG,
     SparsePredCG,
     RidgePred,
@@ -183,6 +195,11 @@ export LossFunction,
     Estimator,
     rlm,
     quantreg,
+    IPODRegression,
+    ipod,
+    outliers,
+    penalty,
+    haspenalty,
     loss,
     tuning_constant,
     refit!,
@@ -224,6 +241,9 @@ abstract type AbstractMEstimator <: AbstractEstimator end
 "Generalized M-Quantile estimator"
 abstract type AbstractQuantileEstimator <: AbstractMEstimator end
 
+"Penalty function"
+abstract type PenaltyFunction{T} end
+
 
 """
     AbstractRobustModel
@@ -243,17 +263,20 @@ abstract type AbstractRegularizedPred{T} end
 
 Base.broadcastable(m::AbstractEstimator) = Ref(m)
 Base.broadcastable(m::LossFunction) = Ref(m)
+Base.broadcastable(m::PenaltyFunction) = Ref(m)
 
 
 include("tools.jl")
 include("losses.jl")
 include("estimators.jl")
+include("penalties.jl")
 include("linpred.jl")
 include("regularizedpred.jl")
 include("linresp.jl")
 include("robustlinearmodel.jl")
 include("univariate.jl")
 include("quantileregression.jl")
+include("ipod.jl")
 include("deprecated.jl")
 
 end # module
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,3 +8,4 @@ Artic = "Artic"

		[default.extend-words]
		qest = "qest"
		wrk = "wrk"