RNA-Seq Generation/Modification for Simulation

R-CMD-check Codecov test coverage License: GPL v3 Lifecycle: stable CRAN status

This package will take real RNA-seq data (either single-cell or bulk) and alter it by adding signal to it. This signal is in the form of a generalized linear model with a log (base-2) link function under a Poisson / negative binomial / mixture of negative binomials distribution. The advantage of this way of simulating data is that you can see how your method behaves when the simulated data exhibit common (and annoying) features of real data. This is without you having to specify these features a priori. We call the way we add signal “binomial thinning”.

The main functions are:

If you find a bug or want a new feature, please submit an issue.

Check out NEWS for updates.


To install from CRAN, run the following code in R:


To install the latest version of seqgendiff, run the following code in R:


To get started, check out the vignettes by running the following in R:

browseVignettes(package = "seqgendiff")

Or you can check out the vignettes I post online: https://dcgerard.github.io/seqgendiff/.


If you use this package, please cite:

Gerard, D (2020). “Data-based RNA-seq simulations by binomial thinning.” BMC Bioinformatics. 21(1), 206. doi: 10.1186/s12859-020-3450-9.

A BibTeX entry for LaTeX users is

    author = {Gerard, David},
    title = {Data-based {RNA}-seq simulations by binomial thinning},
    year = {2020},
    doi = {10.1186/s12859-020-3450-9},
    publisher = {BioMed Central Ltd},
    journal = {BMC Bioinformatics}

Code of Conduct

Please note that the ‘seqgendiff’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.