Writing Extensions in R

A small tutorial for writing extensions in R.

Integrating R with External Languages

You can write extensions for R in FORTRAN or C to help speed things up. This is useful if your R code involves lots of loops or other things that can be better handled in an external language. Most of this information and more can be found in the Writing R Extensions manual.

Hello world!

Let's start with the simplest extension in C: the obligatory "Hello world!" extension. For simple computations, all you have to do is create a C source file with the function you want to call. In this example, we have a hello_world() function.

#include <stdio.h>

void hello_world() {
    printf("Hello world!\n");
}

That's it. Easy, eh? If you want to send arguments to any external functions (which you will probably want to do), you'll need to add some parameters with types specific to what you're expecting from the user. More on that with the next example.

To use this function in R, you first have to compile it. The easiest way to do this is by using R CMD SHLIB like so:
R CMD SHLIB hello_world.c

This creates a shared library named hello_world.so (and also hello_world.o, which you can safely delete if you want), and you can load this library into R by using the dyn.load() function like so:

dyn.load("hello_world.so")

Now you can call the hello_world C function through an R function called .C(), like this:

.C("hello_world")

Your output should look similar to this:
> dyn.load("hello_world.so")
> .C("hello_world")
Hello world!
list()
>

R executes the hello_world C function, then it returns a list object. This list contains the arguments (and their possible alterations) that you sent to the C function, and since our hello_world function didn't have any arguments, this list is empty. Using .C(), if you want to return information back to R, you must do so through parameters, since R doesn't care about what the C function returns (.Call() works differently though; see 3rd example).

You can find the hello_world source below.

ALERT! If the above code doesn't print out anything, it may be because you are running R in a GUI. Try the Rprintf statement instead, and replace #include <stdio.h> with #include <R.h>. See the Printing section in the Writing R Extensions manual for more information.

Hello, you!

Now on to a very slightly more complicated external function. In this function, we'll accept an argument from the user (their name) and use it to print out a statement. In order to do this, we'll need to add some additional headers in order to tell R how this function should be called. (ALERT! You can still write R extensions that accept arguments without these libraries, but they allow us to prequalify arguments so that we don't get anything we don't expect. Since that's a good idea, this example contains the extra steps necessary to do this.)

Put this in the top of your source file along with your other includes:

#include "Rdefines.h"
#include "R_ext/Rdynload.h"

Here's an example of a function that can take one string argument:

void hello_you(char **name) {
    printf("Hello, %s!\n", name[0]);
}

Next you need to register this function with R and tell it how it should be called. Here's the code needed for our example:

SourceHighlighting Error: unknown error

This code needs a bit of explaining. The R_NativePrimitiveArgType array is really just a plain old int array (typedef'd in the Rdynload.h file for readability and consistency I suppose) and is of length 1 since we have one parameter. If you had 5 parameters, this array would be of length 5. This array contains a list of types (which are also just of type int internally). STRSXP tells R that we expect a character vector for our first argument. If you want your function to accept other types of R objects, you can find what constants you should use here.

The cMethods array is list of R_CMethodDef structs, which are pretty simple. For each function you want to register, you need a definition in the cMethods array. The first element of each struct is a string by which you can call the function from R. The second is a pointer to the function (DL_FUNC is a cast to convert the pointer into a different type internally). The third is the number of arguments the function has, and the fourth is the array of data types (hello_youArgs).

Finally, the R_init_<lib> function, where lib is the name of the shared library you want to create (in this case, "hello_you"). This function actually does the registering of our function and is automatically called when R loads the shared library that is created when we compile the code.

Now just follow the same steps in the first example to compile the code and load it (with an additional argument passed to .C()), and your R output should look like this:
> dyn.load("hello_you.so")
> .C("hello_you", "Penpen the penguin")
Hello, Penpen the penguin!
[[1]]
[1] "Penpen the penguin"
>

This time we get the argument we passed to hello_you back in a list. The source code for this example can be found below.

Pretty matrix

There is an aspect of extensions in R that is very important to note. There a two different ways to write extensions. In the .C() way, R copies arguments first before passing them along to your external function. This is fine when you're passing arguments that are small-ish in size, but it isn't really the best idea if you're passing a huge data frame, for example. If you want to handle large R objects in your external functions, you'll want to use the .Call() method, which passes arguments by reference. The next example uses .Call() with a matrix passed as an argument.

After you include the necessary libraries, your C function header should look like this:

SEXP pretty_matrix(SEXP s_matrix) {
...
}

SEXP is the data type that C uses to handle R objects when using .Call(). Unlike in .C(), the return value from external functions called with .Call() are returned as-is back to the caller, so our function will return a type of SEXP instead of void like before.

Before you can manipulate the raw data in an SEXP object, you need to test it for it's R type (character, double, integer, etc.) yourself and assign it to a type you can work with (int, for example). Here's an example of how to do this in your C function:

SourceHighlighting Error: unknown error

This checks to see if the argument is an integer matrix. If it is, it assigns the revelent information using macros defined in the R headers (INTEGER, GET_DIM, etc). If it's not an integer matrix, the function prints out a message and returns R_NilValue, which in R is NULL. Once you get past this section of code, you can treat the matrix variable as a normal int array. Matrices in R are stored internally as single-dimensional row-major arrays, so you'll have to do some arithmetic in order to access "rows" and "columns" like R does (e.g. the value in the third row, first column can be accessed in this case by referring to matrix[2*width + 0]. Remember, C indexing starts at 0 instead of 1!).

When you're done doing calculations and want to return something, there a couple of things you need to do first. R does something called garbage collection. Loosely, garbage collection involves freeing memory associated with unused objects. So any object you want to return needs to be protected from garbage collection first.

SourceHighlighting Error: unknown error

This creates a new numeric R object of length 1, gives it a value (in this case, the number of lines printed out; see code below for details), and returns it back to R.

Registering functions for use in R's .Call() is a little different from registering .C() functions, but not by much. SourceHighlighting Error: unknown error

Notice that you must use a different struct type than with .C(). Instead of R_CMethodDef, you use R_CallMethodDef. Also, telling R what types to expect for your external function is not necessary, since R expects you to check those on your own. The R_registerRoutines call is also a little different. The callMethods array needs to be the 3rd argument instead of the 2nd (cMethods goes in the 2nd slot). The other two arguments are reserved for functions to be called with .Fortran() and .External() (not covered here). If you have some functions for use with .C() and others for use with .Call(), and you can send both the cMethods and callMethods arrays to R_registerRoutines.

Compiling this source code is the same as before, by running R CMD SHLIB pretty_matrix.c. You load it into R the same way, too. The only difference is that you use .Call() instead of .C(). Here's the output:

> dyn.load("pretty_matrix.so")
> .Call("pretty_matrix", matrix(sample(1:99, 99), nrow=9))
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 38 | 73 | 59 | 50 | 46 | 51 | 86 | 37 |  4 | 47 | 21 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 94 | 17 | 60 | 35 | 72 | 18 | 39 | 93 | 53 | 43 | 91 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 88 | 58 | 68 | 75 | 44 | 23 | 87 | 45 |  9 | 55 | 62 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 25 | 64 | 61 | 33 | 31 | 74 | 71 | 76 | 13 | 92 | 99 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 30 | 15 | 63 | 36 | 52 | 65 | 89 | 82 | 41 | 28 |  7 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 42 | 57 | 32 | 77 | 96 | 69 | 81 | 78 | 40 | 12 | 22 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 26 | 29 | 34 | 97 | 66 | 19 | 48 | 67 |  3 | 84 | 90 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 49 | 83 | 20 | 79 | 95 |  1 |  6 | 24 | 80 | 98 | 10 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
|    |    |    |    |    |    |    |    |    |    |    |
| 70 | 56 |  2 | 14 | 27 |  5 | 11 | 16 | 54 | 85 |  8 |
|    |    |    |    |    |    |    |    |    |    |    |
+----+----+----+----+----+----+----+----+----+----+----+
[1] 37
>

Finité

So there you have it. There are lots of other things you can do with extensions, and if you want to learn more, check out the R documentation here.
Topic attachments
I Attachment Action Size Date Who Comment
hello_world.cc hello_world.c manage 0.1 K 09 Jan 2006 - 13:43 JeremyStephens hello world extension
hello_you.cc hello_you.c manage 0.4 K 09 Jan 2006 - 13:43 JeremyStephens hello you extension
pretty_matrix.cc pretty_matrix.c manage 1.8 K 09 Feb 2006 - 13:38 JeremyStephens pretty matrix extension
Topic revision: r6 - 11 May 2010, JeremyStephens
 

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback