Fix bug in IV estimations when lean = TRUE
(thanks to @reifjulian #88).
Fix various bugs related to the use of summary
when lean = TRUE
in the estimation.
Fix bug preventing se = "cluster"
to be used in etable
(thanks to Caleb Kwon).
Fix bug etable
not escaping variable names properly when sdBelow = FALSE
(thanks to Jeppe Viero).
Fix bug in IV estimation with lean = TRUE
.
Fix bug preventing the return of demeaned variables in IV estimations (thanks to @amarbler #94).
i()
now automatically converts its first argument to numeric if it was of type logical. The user can still pass logicals to the argument f2
if the expected behavior is really to treat it as a logical.
Improve fitstat
help and error messages.
Bug in etable
when the default value of fitstat
was set with setFixest_etable
.
Bug in model.matrix
when the model contained fixed-effects and the RHS was requested: the intercept was wrongfully added.
Fix rare bug when i()
was called within a very specific set of functions.
Fix bug in R old release due to anyNA.data.frame
.
Fix bug regarding panel
data sets when variables were created in a data.table
within functions (thanks to @tcovert #76).
Add extra elements to be removed when lean = TRUE
to keep the object as small as possible (reported by @ zozotintin #81).
Fix bug in fixed-effects estimations with multiple LHS and different number of observations per estimation that prevented to get the default behavior for standard-errors to work.
Fix occasional bug when using split
with fixed-effects.
xpd
now appropriately returns a two sided formula when a one sided formula is fed in and the argument lhs
is provided.
Fix bug in coefplot
preventing the proper scaling of the x-axis for interactions when multiple models are displayed.
Fix occasional bug in the ordering of sub-selections of multiple estimations.
For staggered difference-in-difference analyzes: the method of Sun and Abraham (forthcoming, Journal of Econometrics) has been implemented.
After having used i()
to interact cohort dummies with time to treatment dummies, use the function aggregate
to recover the yearly treatment effects.
So far the way to do it, although easy, is a bit arcane but the next versions of the software will include a user-friendly way.
For details, check out the help page of the function aggregate
or the staggered DiD section in the vignette fixest walkthrough.
Function i()
now has the new arguments f2
, drop2
and keep2
which allows the interaction of two factors (useful for staggered DiD estimations).
Argument dof
, used to compute the standard-errors, can now be used at estimation time.
In etable
, the argument digits
can now accepts a character value specifying the way the decimals should be displayed. For example if digits = "r2"
this means that all numbers will be rounded at two decimals and these two decimals will always be displayed. The default behavior is to display significant digits. Follows feature request #82 by @lyifa.
etable
also gains the argument digits.stats
which monitors how the fit statistics decimals should be displayed.
Argument split
now accepts variable names.
More coherence regarding the use of summary
applied to models for which the SEs were computed at estimation time. Now there is a memory of how the SEs were computed, so that, for example, if only the argument dof
is passed to summary
, then the SEs will be clustered in the same way as estimation time and only dof
will change.
Now an error is raised when i()
is used in the fixed-effects part of the formula. The appropriate way is indicated (related to #77 by @rrichmond).
Improved default setting of standard-errors.
Improved error messages.
In multiple estimations, models returning full NA coefficients are not returned (instead of raising an error).
Major bug when predict was used in the presence of fixed-effects (thanks to @jurojas5, #54). Introduced in version 0.7.
When using variable names to cluster the standard-errors inside functions, summary may not fetch the data in the right frame (thanks to @chenwang, #52). Now a completely new internal mechanic is in place.
When using variables with varying slopes and the number of iterations is greater than 300, a bug occurred in the function checking the convergence was right (thanks to @kendonB, #53).
Fix bug in the demeaning algorithm when two variables with varying slopes were identical.
Fix bug in femlm/feNmlm when factor variables are removed due to the removal of some observations.
In summary
, fix bug when the argument cluster
was equal to a formula with expressions and not a variable name (thanks to @edrubin #55).
Fix bug when integers are present in the RHS (thanks to @zozotintin #56).
Fix bug when nb_FE >= 2 and the data was large (thanks to @zozotintin #56).
Fix bug display of how the standard-errors were clustered in etable
.
Fix bug occurring when lags were used in combination with combined fixed-effects (i.e. fe1 ^ fe2) (thanks to @SuperMayo #59).
Fix bug coefplot
when representing multiple estimations and coefficient names are numbers.
base = iris
names(base) = c("y", "x1", "x_endo", "x_inst", "species")
base$endo_bis = 0.5 * base$y + 0.3 * base$x_inst + rnorm(150)
base$inst_bis = 0.2 * base$x_endo + 0.3 * base$endo_bis + rnorm(150)
# The endo/instrument is defined in a formula past a pipe
res_iv1 = feols(y ~ x1 | x_endo ~ x_inst, base)
# Same with the species fixed-effect
res_iv2 = feols(y ~ x1 | species | x_endo ~ x_inst, base)
# To add multiple endogenous regressors: embed them in c()
res_iv3 = feols(y ~ x1 | c(x_endo, x_endo_bis) ~ x_inst + x_inst_bis, base)
The fitstat
function has been significantly enhanced.
Now the following types are supported:
Likelihood ratios
F-tests
Wald tests
IV related tests (F/Wald/Sargan)
common stats like the R2s, the RMSE, Log-likelihood, etc
You can register your own fit statistics. These can then be seamlessly summoned in etable
via the argument fitstat
.
The print.fixest
function now supports the fitstat
argument. This means that you can display your own desired fit statistics when printing fixest
objects. This is especially useful in combination with the setFixest_print
function that allows to define the default fit statistics to display once and for all. See the example in the “Instrumental variables” section of the Walkthrough vignette.
The new function wald
computes basic Wald tests.
New arguments split
and fsplit
: you can now perform split sample estimations (fsplit
adds the full sample).
Estimations for multiple left-hand-sides can be done at once by wrapping the variables in c()
.
In the right-hand-side and the fixed-effects parts of the formula, stepwise estimations can be performed with the new stepwise functions (sw
, sw0
, csw
and csw0
).
The object returned is of class fixest_multi
. You can easily navigate through the results with its subset methods.
aq = airquality[airquality$Month %in% 5:6, ]
est_split = feols(c(Ozone, Solar.R) ~ sw(poly(Wind, 2), poly(Temp, 2)),
aq, split = ~ Month)
# By default: sample is the root
etable(est_split)
# Let's reorder, by considering lhs the root
etable(est_split[lhs = TRUE])
# Selecting only one LHS and RHS
etable(est_split[lhs = "Ozone", rhs = 1])
# Taking the first root (here sample = 5)
etable(est_split[I = 1])
# The first and last estimations
etable(est_split[i = c(1, .N)])
..("regex")
:data(longley)
# All variables containing "GNP" or "ployed" in their names are fetched
feols(Armed.Forces ~ Population + ..("GNP|ployed"), longley)
New style.tex
and style.df
arguments that define the look of either Latex tables or the output data.frames.
it can be set with the new functions style.tex
and style.df
that contain their own documentation.
some etable
arguments have been ported to the style
functions (yesNo
, tablefoot
).
New postprocess.tex
and postprocess.df
arguments which allow the automatic postprocessing of the outputs. See the dedicated vignette on exporting tables for an illustration.
new tabular
arguments which allows to create tabular*
tables (suggestion by @fostermeijer #51).
polynomials and powers are automatically renamed to facilitate comparison across models. You can set their style with the argument poly_dict
.
the labeling of models is enhanced when rep.fixest
is used with different standard-errors (the model names are now “model INDEX.SUB-INDEX”).
the argument subtitles
has been improved, and now automatically displays the samples when split sample estimations are performed.
In all estimations:
subset
: regular subset (long overdue).
split
, fsplit
: to perform split sample estimations.
se
, cluster
: to cluster the standard-errors during the call.
lean
: if TRUE
, then summary is applied and any large object is removed from the result. To save memory => but many methods won’t work afterwards.
fixef.rm
: argument that accepts none
, perfect
, singleton
, both
. Controls the removal of fixed-effects from the observation.
auto parsing of powers. Now you don’t need to use I()
to have powers of variables in the RHS, it is automatically done for you (i.e. x^3
becomes I(x^3)
):
Estimation options can be set globally with setFixest_estimation()
.
The demean
function has been enhanced (with the contribution of Sebastian Krantz).
Internal demeaning algorithm: some copies of the data are avoided when using feglm
.
Internal algorithm of to_integer
(used in all estimations): one copy of the input data is now avoided.
All estimations: smarter handling of the intercept, thus avoiding the reconstruction of the design matrix.
Fix bug int overflow in estimations with only one variable.
Fix bug in tests occurring in R old release.
Fix bug in examples occurring in R old release.
Function i()
now behaves as factor()
, setting automatically a reference when appropriate.
Internal algorithm of i()
is much faster.
In etable
, the user can now provide a type of clustering for each model.
New method rep.fixest
to replicate fixest objects, mostly useful in etable
when several SEs for the same models are to be reported.
Automatic fix when the variance is not positive definite.
Major bug when fixed-effects were combined with ^
and they contained NAs (thanks to @poliquin #35).
Bug when using lead/lags in estimations. The bug was due to a bug in a dependency (dreamerr) and was fixed. Now fixest requires dreamerr version >= 1.2.1. Bug spotted by @seunghoon001 (#44).
Major bug when n_obs x n_vars > 2B or n_obs x n_fixed-effects > 2B. In such cases estimations could just not be done, even leading R to crash when using nthreads > 1. The algorithm was fixed to allow datasets with up to 2B observations to be estimated in all circumstances. Bug reported, and many help for checking provided, by Howard Zihao Zhang.
coefplot
: Problem regarding interactions when observations, and hence coefficients, were removed from the estimation. Now the coefficients are removed from the plot. Bug reported by @phisherblack #45.
coefplot
: Corrected various bugs when asked for the plotting of several estimations.
Fix the stack imbalance warning (report by @shoonlee, #46).
Brand new internal algorithm which now uses closed form solutions when dealing with variables with varying slopes. This means that when variables with varying slopes are present, the algorithm is incomparably faster and more accurate.
Two deep copies of some data are now avoided in the demeaning function. This improves the performance in terms of memory footprint, and also makes the algorithm faster.
New default values for standard-errors (only concerns multiway clustering). They become similar to reghdfe
to increase cross-software comparability. Computing the standard-errors the old way is still possible using the argument dof
. See the dedicated vignette: On standard errors.
Name change in summary
/vcov
/etable
: To get heteroskedasticity-robust standard-errors, se = "hetero"
now replaces se = "white"
to enhance clarity. Note that se = "white"
still works.
fitstat
fitsat
that computes various fit statistics. It is integrated with etable
and can be invoked with the argument fitstat
. So far only two fit statistics are included, but more will come.interact()
You can now use i(var)
to treat the variable var
as a factor. You can select which values to drop/keep with the respective arguments.
Using i(var)
leads to a special treatment of these variables in the functions coefplot
and etable
.
etable
New argument placement
to define the position of the float in Latex (suggestion by Caleb Kwon).
New argument drop.section
, with which you can drop a) the fixed-effects, b) the variables with varying slopes, or c) the statistics, sections (suggestion by Caleb Kwon).
Fix glitch in help pages regarding the use of the ‘%’ (percentage) character in regular expressions.
Two new arguments .vcov
and .vcov_args
to compute the standard-errors with custom functions.
The number of observations (n
) is now treated as a regular statistic and can be placed where one wants.
The statistics can now have custom aliases using the argument dict
.
The overdispersion becomes a regular fit statistic that can be included (or not) using fitstat
.
The dictionnary now applies to the factors of interactions, and the values of factors.
Argument nthreads
:
The new default of argument nthreads
is 50% of all available threads.
Accepts new values: a) 0 means all available threads, b) a number strictly between 0 and 1 will represent the fraction of all threads to use.
When setting formula macros:
xpd
and setFixest_fml
now accept character vectors and numeric scalars on top of formulas.demean
:
coefplot
:
The argument group
now accepts a special character "^^"
, when used, it cleans the beginning of the coefficient name. Very useful for, e.g., factors although factors created with i()
need not that.
When horiz = TRUE
, the order of the coefficients is not reversed any more.
Improved display of numbers in print
method.
Added variables names to X_demeaned
from feols
.
Lagging functions:
Now time.step = NULL
by default, which means that the choice of how to lag is automatically set. This means that the default behavior for time variables equal to Dates or character values should be appropriate.
New operator d
which is the difference operator.
In all estimations:
mem.clean
: internally, intermediary objects are removed as much as possible and gc()
is called before each memory intensive C++ section. Only useful when you’re at the edge of reaching the memory limit.collin.min_norm
, this value informs on the possible presence of collinearity in the system of variables.only.env
and env
:
only.env
, allows to recover only the environment used to perform the estimation (i.e. all the preprocessing done before the estimation).env
, accepts a fixest environment created by only.env
, and performs the estimation using this environment–all other arguments are ignored.env
, we cut all preprocessing).In non-linear estimations:
NL.start
now accepts numeric scalars, initializing all coefficients to the same value (avoids the use of the other argument NL.start.init
).summary.fixest
:
.vcov
now accepts functions that compute the vcov. This ensures convenient compatibility with the sandwich
package (compatibility is still not full though: bootstraped SEs don’t work yet).update.fixest
:
evaluate
to ensure consistency with the update
method from stats.feols
& feglm
:
na_inf.rm
has been removed. It was present for historical reasons, and is removed to increase code clarity.In vcov
, the degree-of-freedom in the small sample correction correction was fixed to “nested” and couldn’t be modified, now corrected. Further, “nested” was not properly accounted for, now corrected.
In etable
, fitsat = FALSE
or fitsat = NA
led to a bug.
r2
: bug when the estimation contained only fixed effects (thanks to Luis Fonseca #27).
Now the BIC
of feglm
is similar to the one of glm
.
Bug in the log-likelihood in the presence of weights, now corrected.
Bug in coefplot
when some interacted variables were removed because of collinearity. Now corrected.
On standard-errors: how are the SEs computed in fixest and how to replicate the SEs from other software.
Exporting estimation tables: how to use efficiently etable
, in particular how to customize the tables.
New arguments: group
, extraline
, notes
, tablefoot
.
group
allows to eliminate variables (like drop
) and adds an extra line with TRUE/FALSE if the model contained those variables.
extraline
allows to add extra lines with any content.
notes
allows to add notes after the table (suggestion by @bgchamps #25).
tablefoot
controls whether the table footer, containing the type of standard-errors and the significance codes, should be displayed.
Renaming: yesNoFixef
=> yesNo
.
Most default values can be set globally with the new function setFixest_etable
.
dof
, used to adjust the small sample corrections, is now much more complete and allows to replicate a large set of estimation results from alternative software.You can now provide custom VCOVs to summary by using the argument .vcov
.
A warning is now prompted when the maximum number of iterations of the algorithm is reached (suggestion by @clukewatson #24]).
The types of standard-errors can now be set globally with the function setFixest_se
(suggestion by @dlindzee #28)
New feols
argument demeaned
. If TRUE
, then the centered variables are returned (y_demeaned
and X_demeaned
). (Suggestion by Linus Holtermann.)
interact
gains two new arguments: drop
and keep
(suggestion by @SuperMayo #23).
hatvalues
has been implemented for feols and feglm estimations.
the estfun
from sandwich
has been implemented.
residuals.fixest
.etable
style
which allows to set many elements of the output table.signifCode
can be equal to "letters"
to display letters instead of stars.setFixest_nthreads
now respects the OMP_THREAD_LIMIT
environment variable.var::fe
syntax with confirm = TRUE
and no reference.etable
when the standard-errors where NA
.feglm
for non-poisson, non-binomial families, are now correct (minor differences).fixef
did not work when the slope was an integer, now corrected (thanks to @clerousset #20).setFixest_fml(..ctrl = ~ var1 + var2)
. Here the macro variable ..ctrl
has been set to the value "var1 + var2"
.fixest
estimation: e.g. data(airquality) ; setFixest_fml(..ctrl = ~ Temp + Day) ; feols(Ozone ~ Wind + ..ctrl, airquality)
.xpd
, which expands formulas. E.g. lm(xpd(Ozone ~ Wind + ..ctrl), airquality)
.to_integer
: user-level version of the internal algorithm transforming any kind of vector (or combination of vectors) into an integer ranging from 1 to the number of unique elements of the vector. Very fast.demean
: user-level version of the demeaning algorithm used in feols
.New internal algorithm to estimate OLS (applies to both feols
and feglm
):
It is numerically more stable.
Incomparably faster when factors are to be estimated (and not explicitly used as fixed-effects).
Collinear variables are removed on the fly.
var::fe(ref)
now accept multiple references (i.e. ref
can be a vector).etable
, the variable names of non-Latex output can now be changed.n
when applying summary to choose the number of coefficients to display.confirm
has been removed from the function interact
.r2
allows more flexibility in the keywords it accepts.dof
gains a new argument adj
which allows to make different types of common small sample corrections. Its other arguments have been renamed for clarity (fixef
=> fixef.K
, exact
=> fixef.exact
, cluster
=> cluster.adj
).feols
and non-poisson, non-binomial models in feglm
. For all other models, z-statistics are used. This complies with the default’s R-stats behavior.residuals
method has been substantially improved, now allowing different types.collinearity
help pages: an example could lead to an error (due to random data generation). It has been removed.collinearity
, corrected the problem of display of the intercept in some situations.cex
and lwd
in coefplot
have been changed to 1 and 1 (instead of par(“cex”) and par(“lwd”)). Otherwise this led to the creation of Rplots.pdf
in the working directory (thanks to Kurt Hornik).fixef_sizes.simplify
, which provides the sizes of the fixed-effects in parentheses when there is no ambiguity.signifCode = NA
.float
which decides whether to embed the table into a table environment. By default it is set to TRUE
if a title
or label
is present.keep
to select the variables to keep in the table.coefstat
defining what should be shown below the coefficients (standard-errors, t-stats or confidence intervals). Suggestion by @d712 #16.horiz
. The coefficients can now be displayed horizontally instead of vertically.lab.fit
: “simple”, the classic axis, “multi”, the labels appear across multiple lines to avoid collision, and “tilted” for tilted labels.style
allows you to set styles with the function setFixest_coefplot
, you can then summon the style in coefplot
with this argument.coefplot
.group
and group.par
).terms.fixest
giving the terms of the estimation.donttest
sections were removed from help pages.etable
. In the process, some of their arguments were “lost”, this is now corrected.coefplot
now accepts lists of estimations.You can now add lags and leads in any fixest
estimations. You only need to provide the panel identifiers with the new argument panel.id
, then you’re free to use the new functions l()
for lags and f()
for leads.
You can also set up a panel data set using the function panel
which allows you to use the lagging functions without having to provide the argument panel.id
, and which dispose of more options for setting the panel.
You can now add interactions in formulas with a new syntax: var::fe(ref)
The command var::fe(ref)
interacts the variable var
with each value of fe
and sets ref
as a reference. Note that if you don’t use the argument ref
, the command var::fe
is identical to var:factor(fe)
.
Using var::fe(ref)
to write interactions opens up a special treatment of such variables in the exporting function etable
and in the coefficient plotting function coefplot
.
coefplot
You can plot coefficients and their associated confidence intervals with the function coefplot
.
coefplot
dispose of many options, whose default values can be set with the function setFixest_coefplot
.
As for the function etable
, you can easily rename/drop/order the coefficients.
coefplot
detects when interactions have been used and offers a special display for it.
[etable] Estimations table: new function to export the results of multiple estimations. Replaces the two functions esttex
and esttable
(the two functions still exist but they will be deprecated in the future).
[Lagging] New functions related to lagging: l
, f
, panel
, unpanel
and [.fixest_panel
.
[Utilities] A set of small utility functions has been added. They allow to extract part a coefficient table or parts of it (like the t-statistics of the standard-error) from an estimation. These functions are coeftable
, ctable
(an alias to coeftable
), se
, tstat
and pvalue
.
[coefplot] The functions coefplot
and setFixest_coefplot
.
[dof] New function to set the type of degree of freedom adjustment when computing the variance-covariance matrix. You can permanently set the type of DoF adjustment with the new function setFixest_dof().
dict=c(x1="Wind", x2="Rain")
, with an estimation with the following variables ‘x1’, ‘x2’, ‘x1:x2’ will lead to the following aliases in Latex ‘Wind’, ‘Rain’ and ‘Wind \(times\) Rain’.yesNoFixef
can be of length one, defaulting the second element to the empty string.did_estimate_yearly_effects
.feglm
.-[did_means] New function did_means
to conveniently compare means of groups of observations (both treat/control and pre/post). Contains tools to easily export in Latex.
sym
macro in Latex is dropped.This package is an effort to create a family of fast and user-friendly functions to perform estimations with multiple fixed-effects (F.E.).
Estimations with fixed-effects (or call it factor variables) is a staple in social science. Hence having a package gathering many methods with fast execution time is of prime importance. At the time of this version, this is the fastest existing method to perform F.E. estimations (often by orders of magnitude, compared to the most efficient alternative methods [both in R and Stata]). The underlying method to obtain the F.E. is based on Berge 2018, and the workhorse of the code is in c++ parallelized via OpenMP (btw thanks Rcpp for simplifying coders’ life!).
This package is the follow up of the (now deprecated) package FENmlm
which performed fixed-effects estimations but for only four likelihood families. Package fixest
completely supersedes FENmlm
by extending the method to regular OLS and all GLM families, and adding new utility functions. Further, the design of the functions has been completely overhauled and extended towards much more user-friendliness. Massive effort has been put into providing a set of informative error messages to the user for quick debugging of her workflow (e.g. one of the functions contains over 100 different errors messages).