Quick Usage Guide to the ‘fastglm’ Package

2019-03-06

The fastglm package is intended to be a fast and stable alternative to the glm() and glm2() functions for fitting generalized lienar models. The The fastglm package is compatible with R’s family objects (see ?family). The package can be installed via

devtools::install_github("jaredhuling/fastglm")

library(fastglm)

Example

Currently, the fastglm package does not allow for formula-based data input and is restricted to matrices. We use the following example to demonstrate the usage of fastglm:

data(esoph)
x <- model.matrix(cbind(ncases, ncontrols) ~ agegp + unclass(tobgp)
+ unclass(alcgp), data = esoph)
y <- cbind(esoph$ncases, esoph$ncontrols)

gfit1 <- fastglm(x = x, y = y, family = binomial(link = "cloglog"))

summary(gfit1)
#>
#> Call:
#> fastglm.default(x = x, y = y, family = binomial(link = "cloglog"))
#>
#> Deviance Residuals:
#>     Min       1Q   Median       3Q      Max
#> -1.8536  -0.6480  -0.2112   0.3562   2.1231
#>
#> Coefficients:
#>                Estimate Std. Error z value Pr(>|z|)
#> (Intercept)    -3.75579    0.27555 -13.630  < 2e-16 ***
#> agegp.L         2.74449    0.63265   4.338 1.44e-05 ***
#> agegp.Q        -1.30504    0.57271  -2.279  0.02268 *
#> agegp.C         0.21175    0.43247   0.490  0.62439
#> agegp^4         0.07676    0.29202   0.263  0.79266
#> agegp^5        -0.19303    0.17627  -1.095  0.27348
#> unclass(tobgp)  0.20733    0.06837   3.032  0.00243 **
#> unclass(alcgp)  0.53660    0.07011   7.654 1.95e-14 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#>     Null deviance: 23.273  on 87  degrees of freedom
#> Residual deviance: 63.110  on 80  degrees of freedom
#> AIC: 226.59
#>
#> Number of Fisher Scoring iterations: 6

Computational stability

The fastglm package does not compromise computational stability for speed. In fact, for many situations where glm() and even glm2() do not converge, fastglm() does converge.

As an example, consider the following data scenario, where the response distribution is (mildly) misspecified, but the link function is quite badly misspecified. In such scenarios, the standard IRLS algorithm tends to have convergence issues. The glm2() package was designed to handle such cases, however, it still can have convergence issues. The fastglm() package uses a similar step-halving technique as glm2(), but it starts at better initialized values and thus tends to have better convergence properties in practice.

set.seed(1)
x <- matrix(rnorm(10000 * 100), ncol = 100)
y <- (exp(0.25 * x[,1] - 0.25 * x[,3] + 0.5 * x[,4] - 0.5 * x[,5] + rnorm(10000)) ) + 0.1

system.time(gfit1 <- fastglm(cbind(1, x), y, family = Gamma(link = "sqrt")))
#>    user  system elapsed
#>   0.842   0.027   0.870

system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt")) )
#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds
#> Warning: glm.fit: algorithm did not converge
#>    user  system elapsed
#>   2.918   0.088   3.010

system.time(gfit3 <- glm2::glm2(y~x, family = Gamma(link = "sqrt")) )
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance

#> Warning: step size truncated due to increasing deviance
#>    user  system elapsed
#>   2.211   0.082   2.308

## Note that fastglm() returns estimates with the
## largest likelihood
logLik(gfit1)
#> 'log Lik.' -16030.81 (df=102)
logLik(gfit2)
#> 'log Lik.' -16704.05 (df=102)
logLik(gfit3)
#> 'log Lik.' -16046.66 (df=102)

coef(gfit1)[1:5]
#>  (Intercept)           X1           X2           X3           X4
#>  1.429256009  0.125873599  0.005321164 -0.129389740  0.238937255
coef(gfit2)[1:5]
#>   (Intercept)            x1            x2            x3            x4
#>  1.431168e+00  1.251936e-01 -6.896739e-05 -1.281857e-01  2.366473e-01
coef(gfit3)[1:5]
#>   (Intercept)            x1            x2            x3            x4
#>  1.426864e+00  1.242616e-01 -9.860241e-05 -1.254873e-01  2.361301e-01

## check convergence of fastglm
gfit1$converged #> [1] TRUE ## number of IRLS iterations gfit1$iter
#> [1] 17

## now check convergence for glm()
gfit2$converged #> [1] FALSE gfit2$iter
#> [1] 25

## check convergence for glm2()
gfit3$converged #> [1] TRUE gfit3$iter
#> [1] 19

## increasing number of IRLS iterations for glm() does not help that much
system.time(gfit2 <- glm(y~x, family = Gamma(link = "sqrt"), control = list(maxit = 100)) )
#> Warning: step size truncated: out of bounds
#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds

#> Warning: step size truncated: out of bounds
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: algorithm stopped at boundary value
#>    user  system elapsed
#>  12.130   0.506  13.035

gfit2$converged #> [1] FALSE gfit2$iter
#> [1] 100

logLik(gfit1)
#> 'log Lik.' -16030.81 (df=102)
logLik(gfit2)
#> 'log Lik.' -16054.15 (df=102)