YAML: an Overview


Basic syntax

The basic syntax of YAML is to use key-value pairs in the format key: value. A YAML code block should be fenced in with --- before and after (you can also use ... to end the YAML block, but this is not very common in R Markdown).

In R, the equivalent structure is a list with named character vector: list(author = "Malcolm Barrett"). In fact, you can call this list in R Markdown using the metadata object; in this case, metadata$author will return "Malcolm Barrett"

In YAML, spaces are used to indicate nesting. When we want to specify the output function pdf_document(toc = TRUE), we need to nest it under the output field. We also need to nest toc under pdf_document so that it gets passed to that function correctly.

In R, the equivalent structure is a nested list, each with a name: list(output = list(pdf_document = list(toc = TRUE))). Similarly, you can call this in R Markdown using the metadata object, e.g. metadata$output$pdf_document$toc. The hierarchical structure (which you can see with draw_yml_tree()) looks like this:

└── output:
    └── pdf_document:
        └── toc: true

Without the extra indents, YAML doesn’t know toc is connected to pdf_document and thinks the value of pdf_document is NULL. YAML that looks like this:

has a hierarchy that looks like this:

├── output:
│   └── pdf_document: null
└── toc: true

If you use output functions without additional arguments, the value of output can simply be the name of the function.

However, if you’re specifying more than one output type, you must use the nesting syntax. If you don’t want to include additional arguments, use "default" as the function’s value.

Some YAML fields take unnamed vectors as their value. You can specify an element of the vector by adding a new line and - (note that the values are not indented below category here).

In R, the equivalent structure is a list with a named vector: list(categories = c("R", "Reprodicible Research")). metadata$category will return c("R", "Reprodicible Research"). Another way to specify vectors is to use [] with each object separated by a column, as in the syntax for c(). This YAML is equivalent to the YAML above:

By default, ymlthis uses the - syntax for vectors.- is also used to group elements together. For instance, in the params field for parameterized reports, we group parameter information together by using -. The first line is the name and value of the parameter, while all the lines until the next - are extra information about the parameter. While you can use metadata to call objects in params, params has it’s own object you can call directly: params$a and params$data will return the values of a and data.

In R, the equivalent structure is a nested list that contains a list of unnamed lists: list(param = list(list(a = 1, input = numeric), list(data = "data.csv", input = "file"))). The inner-most lists group items together, e.g. list(a = 1, input = numeric) groups a and input.

└── params:
    ├── a: 1.0
    └── input: numeric
    ├── data: data.csv
    └── input: text

Types in YAML

You may have noticed that strings in YAML don’t always need to be quoted. However, it can be useful to explicitly wrap strings in quotes when they contain special characters like : and @.

R code can be written as inline expressions `r expr`. yml_code() will capture R code for you and put it in a valid format. R code in params needs to be slightly different: use !r (e.g. !r expr) to call an R object.

Logical values in YAML are unusual: true/false, yes/no, and on/off are all equivalent to TRUE/FALSE in R. Any of these turn on the table of contents:

By default, ymlthis uses true/false. If you want to use any of these values literally (e.g. you want a string equal to "yes"), you need to wrap them in quotation marks:

NULL can be specified using null or ~. By default, ymlthis uses null. If you want to specify an empty vector, use [], e.g. category: []. For an empty string, just use empty quotation marks ("").

Sources of YAML

Where do the YAML fields you use in R Markdown come from? Many YAML fields that we use come from Pandoc from the rmarkdown package. These both use YAML to specify the build of the document and to pass information to be printed in a template. Pandoc templates can also be customized to add new YAML. The most common sources of YAML are:

  1. Pandoc
  2. R Markdown
  3. Output functions (such as rmarkdown::pdf_document())
  4. Custom Pandoc templates
  5. R Markdown extension packages (such as blogdown)
  6. Hugo (in the case of blogdown)

Because YAML is an extensible approach to metadata, and there is often no way to validate that your YAML is correct. YAML will often fail silently if you, for instance, make a typo in the field name or misspecify the nesting between fields. For more information on the fields available in R Markdown and friends, see the YAML Fieldguide.