Tag Archives: programming

Tools for Hacking R: Subversion

The development version of R is stored in a Subversion repository at the URL http://svn.r-project.org/R/trunk/. In fact, you can browse the source code by clicking the link.

Subversion Hierarchy

Subversion is software for source code revision control. That means it keeps track of changes, who made them, when they were made, and any comments about the change. By convention, the source code is usually kept in three sub-directories of the main repository, named trunk, branches, and tags.

The trunk directory contains the main development tree (currently R 2.12.0). The branches directory contains several additional copies of the source tree, and these copies are used for experimental purposes. That is, if someone wants to try out a radical idea that might break something, he creates a new branch, a copy of trunk, and tries his ideas out there. If the ideas are successful, then the changes may be merged back into trunk. Otherwise, the branch may be deleted, or simply fall into dereliction. When the R core team decides it's time to release a new version of R, a new tree is created in the tags directory, corresponding to the release number. This directory is filled with a snapshot of the current source tree in trunk and ought not to be modified further.

Checkout and Edit Code

R's source code may be 'checked out' from the repository using a Subversion client. In Debian or Ubuntu Linux, we can install Subversion with aptitude:

$ aptitude install subversion

We may then use the Subversion command checkout to download a copy of the trunk repository:

$ svn checkout http://svn.r-project.org/R/trunk/ R-devel

This command will download the trunk tree of the source code to the local directory R-devel.

The local copy of the R source tree may then be edited freely. Suppose we want to make a small change in a source file, say R-devel/src/main/connections.c, that increases the limit on the number of concurrent R connections. That is, at line 63, I change

#define NCONNECTIONS 128 /* snow needs one per slave node */

to read

#define NCONNECTIONS 256 /* snow needs one per slave node */

Hence, the my local copy of the R source will now allow up to 256 connections, rather than 128.

Generate a Patch

If changes made to R's source are good, they ought to be shared with the R community. The traditional mechanism for sharing code changes is with a patch. The Subversion software provides a mechanism to easily generate a patch of the changes to a Subversion repository. Continuing with our example, we can issue the Subversion diff command:

$ svn diff connections.c
Index: connections.c
===================================================================
--- connections.c	(revision 52769)
+++ connections.c	(working copy)
@@ -60,7 +60,7 @@
   extern UImode  CharacterMode;
 #endif

-#define NCONNECTIONS 128 /* snow needs one per slave node */
+#define NCONNECTIONS 256 /* snow needs one per slave node */
 #define NSINKS 21

 static Rconnection Connections[NCONNECTIONS];

The output from this command is in a 'diff' format. That is, lines that were removed are marked with the '-' symbol, and lines that were added are marked with the '+' symbol. The output of this command may be redirected to a file

$ svn diff connections.c > myedits.patch

The file myedits.patch is a patch for the changes we made. When talking about patches, it's also prudent to describe the origin of the changed code. In this case, we say the myedits.patch is a patch against the development version of R, at revision 52769.

We can share our changes via the R mailing lists by attaching the patch, or by copy-and-pasting the patch into the email text.