Category Archives: Technical

Read sas7bdat files in R with GGASoftware Parso library

... using the new R package sas7bdat.parso.

The software company GGASoftware has extended the work of myself and others on the sas7bdat R package by developing a Java library called Parso, which also reads sas7bdat files. They have worked out most of the remaining kinks. For example, the Parso library reads sas7bdat files with compressed data (i.e., written with COMPRESS=yes or COMPRESS=binary). I hope to eventually bring the project full circle, and incorporate their improvements into the sas7bdat file format documentation and code in the sas7bdat package.

The Parso library is made available under terms of the GPLv3, and is also available under a commercial license. So, last weekend, with the help of Tobias Verbeke's helloJavaWorld R package template, I implemented an R package that wraps the functionality of the Parso library. The new package, sas7bdat.parso (currently hosted exclusively on GitHub), depends on the R package rJava, and implements the functions s7b2csv and read.sas7bdat.parso. The former function is the workhorse, which reads a sas7bdat file and writes a corresponding CSV file. All of the file input/output happens in the Java implementation (for speed and simplicity). The latter function read.sas7bdat.parso simply converts a sas7bdat file to temporary (i.e., using tempfile) CSV file, and then reads the CSV file using read.csv. There may still be some kinks the the Parso library, or in the wrapper R package, but I hope that this additional resource will help finally eliminate the SAS data file barrier that many of us have experienced for years.

Installation of the R package rJava is more complicated than simply calling install.packages("rJava"). In order for the rJava package to work, and hence the sas7bdat.parso package, a JDK (Java Development Kit) must be installed. You can download the Oracle JDK from the Oracle website. Once the JDK is installed, the easiest way to install the sas7bdat.parso library is using the install_github function in the devtools package (e.g., library("devtools"); install_github("biostatmatt/sas7bdat.parso")). For additional details on installing the rJava package, see the RForge site.