Getting started with sparsegl

This package provides tools for fitting regularization paths for sparse group-lasso penalized learning problems. The model is fit for a sequence of the regularization parameters.

The strengths and improvements that this package offers relative to other sparse group-lasso packages are as follows:


You can install the released version of sparsegl from CRAN with:


You can install the development version from Github with:

# install.packages("remotes")

Vignettes are not included in the package by default. If you want to include vignettes, then use this modified command:

  build_vignettes = TRUE, dependencies = TRUE

For this getting-started vignette, first, we will randomly generate X, an input matrix of predictors of dimension \(n\times p\). To create y, a real-valued vector, we use either a

where the coefficient vector \(\beta^*\) is specified as below, and the white noise \(\epsilon\) follows a standard normal distribution. Then the sparse group-lasso problem is formulated as the sum of mean squared error (linear regression) or logistic loss (logistic regression) and a convex combination of the \(\ell_1\) lasso penalty with an \(\ell_2\) group lasso penalty:


n <- 100
p <- 200
X <- matrix(data = rnorm(n * p, mean = 0, sd = 1), nrow = n, ncol = p)
beta_star <- c(rep(5, 5), c(5, -5, 2, 0, 0), rep(-5, 5), 
               c(2, -3, 8, 0, 0), rep(0, (p - 20)))
groups <- rep(1:(p / 5), each = 5)

# Linear regression model
eps <- rnorm(n, mean = 0, sd = 1)
y <- X %*% beta_star + eps

# Logistic regression model
pr <- 1 / (1 + exp(-X %*% beta_star))
y_binary <- rbinom(n, 1, pr)


Given an input matrix X, and a response vector y, a sparse group-lasso regularized linear model is estimated for a sequence of penalty parameter values. The penalty is composed of lasso penalty and group lasso penalty. The other main arguments the users might supply are:

fit1 <- sparsegl(X, y, group = groups)

Plotting sparsegl objects

This function displays nonzero coefficient curves for each penalty parameter lambda values in the regularization path for a fitted sparsegl object. The arguments of this function are:

To elaborate on these arguments:

plot(fit1, y_axis = "group", x_axis = "lambda")

plot(fit1, y_axis = "coef", x_axis = "penalty", add_legend = FALSE)


This function performs k-fold cross-validation (cv). It takes the same arguments X, y, group, which are specified above, with additional argument pred.loss for the error measure. Options are "default", "mse", "deviance", "mae", and "misclass". With family = "gaussian", "default" is equivalent to "mse" and "deviance". In general, "deviance" will give the negative log-likelihood. The option "misclass" is only available if family = "binomial".

fit_l1 <- cv.sparsegl(X, y, group = groups, pred.loss = "mae")


A number of S3 methods are provided for both sparsegl and cv.sparsegl objects.

coef <- coef(fit1, s = c(0.02, 0.03))
predict(fit1, newx = X[100,], s = fit1$lambda[2:3])
#>             s1        s2
#> [1,] -4.071804 -4.091689
predict(fit_l1, newx = X[100,], s = "lambda.1se")
#>             s1
#> [1,] -15.64857
#> Call:  sparsegl(x = X, y = y, group = groups) 
#> Summary of Lambda sequence:
#>          lambda index nnzero active_grps
#> Max.    0.62948     1      0           0
#> 3rd Qu. 0.19676    26     20           4
#> Median  0.06443    50     19           4
#> 1st Qu. 0.02014    75     25           5
#> Min.    0.00629   100    111          23


With extremely large data sets, cross validation may be to slow for tuning parameter selection. This function uses the degrees of freedom to calculate various information criteria. This function uses the “unknown variance” version of the likelihood. Only implemented for Gaussian regression. The constant is ignored (as in stats::extractAIC()).

where df is the degree-of-freedom, and n is the sample size.

The df component of a sparsegl object is an approximation (albeit a fairly accurate one) to the actual degrees-of-freedom. However, computing the exact value requires inverting a portion of \(\mathbf{X}^\top \mathbf{X}\). So this computation may take some time (the default computes the exact df). For more details about how this formula, see (Vaiter, Deledalle, Peyré, et al., 2012).1

risk <- estimate_risk(fit1, X, approx_df = FALSE)

  1. Vaiter S, Deledalle C, Peyré G, Fadili J, Dossal C. (2012). The Degrees of Freedom of the Group Lasso for a General Design.↩︎