jstor 0.3.6

This is another small release to fix compatibility with readr v1.3.0 and tibble v2.0.0. There are no other changes.

jstor 0.3.5

This is a small release, mainly to fix compatibility with version 1.2.0 of readr. There is one breaking change however:

Breaking changes

jstor 0.3.4

jstor 0.3.3

Removed functionality

New features

Bug fixes

Other changes

jstor 0.3.2

This is a hotfix to resolve an issue with writing to other directories than temporary folders during tests, which should not have happend in the first place.

jstor 0.3.1

jstor 0.3.0

Breaking changes

jst_import and jst_import_zip now use futures as a backend for parallel processing. This makes internals more compact and reduces dependencies. Furthermore this reduces the number of arguments, since the argument cores has been removed. By default, the functions run sequentially. If you want them to execute in parallel, use futures:

library(future)
plan(multiprocess)

jst_import_zip("zip-archive.zip",
               import_spec = jst_define_import(article = jst_get_article),
               out_file = "outfile")

If you want to terminate the proceses, at least on *nix-systems you need to kill them manually (once again).

Importing data directly from zip-files

There is a new set of functions which lets you directly import files from .zip-archives: jst_import_zip() and jst_define_import().

In the following example, we have a zip-archive from DfR and want to import metadata on books and articles. For all articles we want to apply jst_get_article() and jst_get_authors(), for books only jst_get_book(), and we want to read unigrams (ngram1).

First we specify what we want, and then we apply it to our zip-archive:

# specify definition
import_spec <- jst_define_import(article = c(jst_get_article, jst_get_authors),
                                 book = jst_get_book,
                                 ngram1 = jst_get_ngram)

# apply definition to archive
jst_import_zip("zip_archive.zip",
               import_spec = import_spec,
               out_file = "out_path")

If the archive contains also research reports, pamphlets or other ngrams, they will not be imported. We could however change our specification, if we wanted to import all kinds of ngrams (given that we originally requested them from DfR):

# import multiple forms of ngrams
import_spec <- jst_define_import(article = c(jst_get_article, jst_get_authors),
                                 book = jst_get_book,
                                 ngram1 = jst_get_ngram,
                                 ngram2 = jst_get_ngram,
                                 ngram3 = jst_get_ngram)

Note however that for larger archives, importing all ngrams takes a very long time. It is thus advisable to only import ngrams for articles which you want to analyse, i.e. most likely a subset of the initial request. The new function jst_subset_ngrams() helps you with this (see also the section on importing bigrams in the case study.

Before importing all files from a zip-archive, you can get a quick overview with jst_preview_zip().

New vignette

The new vignette("known-quirks") lists common problems with data from JSTOR/DfR. Contributions with further cases are welcome!

New functions

Minor changes

jstor 0.2.6