Working with Labelled Data

Daniel Lüdecke

2019-09-13

This vignette shows a small example how functions to work with labelled data can be implemented in a typical data visualization workflow.

Labelled Data

In software like SPSS, it is common to have value and variable labels as variable attributes. Variable values, even if categorical, are mostly numeric. In R, however, you may use labels as values directly:

factor(c("low", "high", "mid", "high", "low"))
#> [1] low  high mid  high low 
#> Levels: high low mid

Reading SPSS-data with haven or sjlabelled keeps the numeric values for variables and adds the value and variable labels as attributes. See following example from the sample-dataset efc, which is part of the sjlabelled-package:

library(sjlabelled)
data(efc)
str(efc$e42dep)
#>  num [1:908] 3 3 3 4 4 4 4 4 4 4 ...
#>  - attr(*, "label")= chr "elder's dependency"
#>  - attr(*, "labels")= Named num [1:4] 1 2 3 4
#>   ..- attr(*, "names")= chr [1:4] "independent" "slightly dependent" "moderately dependent" "severely dependent"

While all plotting and table functions of the sjPlot-package make use of these attributes, many packages and/or functions do not consider these attributes, e.g. R base graphics:

library(sjlabelled)
data(efc)
barplot(
  table(efc$e42dep, efc$e16sex), 
  beside = T, 
  legend.text = T
)

As you can see in the above figure, the plot has neither axis nor legend labels.

Adding value labels as factor values

as_label() is a sjlabelled-function that converts a numeric variable into a factor and sets attribute-value-labels as factor levels. When using factors with valued levels, the bar plot will be labelled.

barplot(
  table(as_label(efc$e42dep),
        as_label(efc$e16sex)), 
  beside = T, 
  legend.text = T
)

Getting and setting value and variable labels

There are four functions that let you easily set or get value and variable labels of either a single vector or a complete data frame:

With this function, you can easily add titles to plots dynamically, i.e. depending on the variable that is plotted.

barplot(
  table(as_label(efc$e42dep),
        as_label(efc$e16sex)), 
  beside = T, 
  legend.text = T,
  main = get_label(efc$e42dep)
)

Restore labels from subsetted data

The base subset() function drops label attributes (or vector attributes in general) when subsetting data. In the sjlabelled-package, there are handy functions to deal with this problem: copy_labels() and remove_labels().

copy_labels() adds back labels to a subsetted data frame based on the original data frame. And remove_labels() removes all label attributes.

Losing labels during subset

Add back labels

Conclusion

When working with labelled data, especially when working with data sets imported from other software packages, it comes very handy to make use of the label attributes. The sjlabelled-package supports this feature and offers useful functions for these tasks.