R YAML package
The R
YAML package implements the
Syck YAML parser for R, along with some R methods for converting R objects to YAML.
You can see the development history of this package
here.
What is YAML?
YAML is a human-readable markup language. With it, you can create easily readable documents that can be consumed by a variety of programming languages. It's used frequently with
Ruby and
Ruby on Rails.
Examples
Hash of baseball teams per league:
american:
- Boston Red Sox
- Detroit Tigers
- New York Yankees
national:
- New York Mets
- Chicago Cubs
- Atlanta Braves
Data dictionary specification:
- field: ID
description: primary identifier
type: integer
primary key: yes
- field: DOB
description: date of birth
type: date
format: yyyy-mm-dd
- field: State
description: state of residence
type: string
Installation
CRAN
You can install this package directly from CRAN by running (from within R):
install.packages('yaml')
Git
- Download the source via git:
git clone git://github.com/viking/r-yaml yaml
- Run
R CMD check yaml to make sure everything is OK.
- Run
R CMD INSTALL yaml (as root if necessary).
Usage
The
yaml packages has two main functions:
yaml.load and
as.yaml.
yaml.load
The
yaml.load function is the YAML parsing function. It accepts a YAML document as a string. Here's a simple example that parses a YAML sequence:
x <- "
- 1
- 2
- 3
"
yaml.load(x) #=> [1] 1 2 3
Strings
A YAML string is the basic building block of YAML documents. Example of a YAML document with one element:
1.2345
In this case, the string "1.2345" is typed as a
float (or numeric) by the parser.
yaml.load would return a numeric vector of length 1 for this document.
yaml.load("1.2345") #=> [1] 1.2345
Sequences
A YAML sequence is a list of elements. Here's an example of a simple YAML sequence:
- this
- is
- a
- simple
- sequence
- of
- strings
If you pass a YAML sequence to
yaml.load, a couple of things can happen. If all of the elements in the sequence are uniform,
yaml.load will return a vector of that type (i.e. character, integer, real, or logical). If the elements are
not uniform,
yaml.load will return a list of the elements. No coercion is done by default.
Maps
A YAML map is a list of paired keys and values, or hash, of elements. Here's an example of a simple YAML map:
one: 1
two: 2
three: 3
four: 4
Passing a map to
yaml.load will produce a named list by default. That is, keys are coerced to strings. Since it is possible for the keys of a YAML map to be almost anything (not just strings), you might not want
yaml.load to return a named list. If you want to preserve the data type of keys, you can pass
as.named.list = FALSE to
yaml.load. If
as.named.list is FALSE,
yaml.load will create a
keys attribute for the list it returns instead of coercing the keys into strings.
Handlers
yaml.load has the capability to accept custom handler functions. With handlers, you can customize
yaml.load to do almost anything you want. Example of handler usage:
integer.handler <- function(x) { as.integer(x) + 123 }
yaml.load("123", handlers = list(int = integer.handler)) #=> [1] 246
Handlers are passed to
yaml.load through the
handlers argument. The
handlers argument must be a named list of functions, where each name is the YAML type that you want to be handled by your function. The functions you provide must accept one argument and must return an R object.
Most of the time, handler functions will be passed a string. In the example above,
integer.handler was passed the string "123". However, you can also provide custom handler functions to deal with sequences and maps.
Sequence handlers
Custom sequence handlers will be passed a list of objects. You can then convert the list into whatever you want and return it. Example:
sequence.handler <- function(x) {
tmp <- as.numeric(x)
tmp / 5
}
string <- "
- foo
- bar
- 123
- 4.567
"
yaml.load(string, handlers = list(seq = sequence.handler)) #=> [1] NA NA 24.6000 0.9134
Map handlers
Custom map handlers work much in the same way as custom list handlers. A map handler function is passed a named list, or a list with a
keys attribute (depending on the value of
as.named.list). Example:
string <- "
a:
- 1
- 2
b:
- 3
- 4
"
yaml.load(string, handlers = list(map = function(x) { as.data.frame(x) }))
Returns:
b a
1 3 1
2 4 2
An interesting thing to note in this example is that the
b column appears before the
a column in the resulting data frame. This is because YAML maps are considered to be hashes, and therefore, order is not preserved. If you want an ordered map, you can use a combination of maps and sequences like so:
- a:
- 1
- 2
- b:
- 3
- 4
as.yaml
as.yaml is an S3 method used to convert R objects into YAML strings. Example
as.yaml usage:
x <- as.yaml(1:5)
cat(x, "\n")
Output from above example:
- 1
- 2
- 3
- 4
- 5
Arguments
Here's the list of
as.yaml arguments:
| Name | Description | Default |
| x | the object to convert | |
| line.sep | line separator to use ("\n" or "\r\n") | "\n" |
| indent | number of spaces to use for indenting | 2 |
| pre.indent | number of spaces to shift document | 0 |
| column.major | determines if data.frames are output as column major | TRUE |
pre.indent
The
pre.indent option will shift the entire document to the right by the number of spaces you specify. For example:
x <- as.yaml(1:5, pre.indent = 4)
cat(x, "\n")
Outputs:
- 1
- 2
- 3
- 4
- 5
column.major
The
column.major option determines how a data frame is converted into YAML. By default,
column.major is TRUE.
Example of
as.yaml when
column.major is TRUE:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = TRUE)
cat(y, "\n")
Outputs:
a:
- 1
- 2
- 3
- 4
- 5
b:
- 6
- 7
- 8
- 9
- 10
Whereas:
x <- data.frame(a=1:5, b=6:10)
y <- as.yaml(x, column.major = FALSE)
cat(y, "\n")
Outputs:
- a: 1
b: 6
- a: 2
b: 7
- a: 3
b: 8
- a: 4
b: 9
- a: 5
b: 10
Additional documentation
For more information, run
help(package='yaml') or
example('yaml-package') for some examples.
To Do
- Include named vector support (instead of only named lists)
- Add date support
to top