Tag Archives: R

sas7bdat database reader update

An earlier post (1216) introduced a compatibility study (i.e. reverse engineering) of the sas7bdat database file format. The code and documentation for this are here: http://github.com/biostatmatt/sas7bdat. I've recently restructured the code as an R package, and added some functionality. Look for the sas7bdat package on the CRAN. Also, the read.sas7bdat code has been ported to a Java framework by Kasper Sørensen: http://eobjects.org/svn/SassyReader/trunk/ under the LGPL.

The read.sas7bdat function now returns a data frame with the column.info attribute, which describes the various attributes of the database fields. The column.info attribute is a list of lists, one for each field. Each list contains zero or more of:

  • name: The field name
  • label: The field label (usually a longer description)
  • offset: The field offset in packed binary row data (bytes)
  • length: The field length (bytes)
  • type: The field type, either 'character' or 'numeric'

The document describing the sas7bdat binary format is included as a vignette (using rst2latex). Here is a preview of the R package: sas7bdat_0.1.tar.gz. The package comes with a list of internet resources for sas7bdat test files (see data(sas7bdat.sources)).