Category Archives: Technical

A Short Side-by-side Comparison of the R and NumPy Array Types

Feature NumPy R
contiguous (virtual) memory
'view' memory model
subset-assignment
vectorized operations
memory-mapping ✘*
broadcasting rules
index arrays

This comparison is current as of R 2.13.0, NumPy version 1.4.1, and other web resources to date. Because this post was motivated by a recent article (cited below) promoting the NumPy array, the comparison above may seem one-sided. To be fair, I welcome corrections and additions to the above feature table.

The NumPy Array: A Structure for Efficient Numerical Computation
Comput. Sci. Eng. 13, 22 (2011)
http://link.aip.org/link/?CSENFA/13/22/1

contiguous (virtual) memory

Contiguous (virtual) memory means that memory used by an array is allocated as a single block, and that the elements of an array are stored adjacently. This type of storage enables efficient operations on the array. The 'virtual' qualification signifies that memory may only appear contiguous to the executing process, but be noncontiguous in physical memory.

'view' memory model

A 'view' memory model allows an array to be 'viewed' differently under certain operations (matrix transpose, many types of subsetting, reshaping) without copying the memory where the array's data is stored. The NumPy array has a 'view' memory model, but the R array generally does not. However, the 'view' memory model may be viable for R arrays, since the memory model is mostly invisible to the user.

subset-assignment

Subset assignment refers to assignments that modify one or more elements of an array. For example:

> x <- c(1,2,3,4)
> x[1] <- 100
> x
[1] 100   2   3   4

vectorized operations

Vectorized operations refer to expressions where an element-wise operation is implicit. Consider this R code:

> x <- c(1,2,3,4)
> x * 3
[1]  3  6  9 12

where x * 3 implicitly specifies that each element of x should be multiplied by 3. Vectorized operations avoid the need for looping in many cases.

memory-mapping

Memory mapping refers to an ability to map a program's memory onto a file. Hence, a large array stored on disk may be manipulated without loading the entire array into memory. *R doesn't offer a memory mapping facility for arrays. However, some memory-mapping functionality is provided by the bigmemory and mmap extension packages. R also provides a well-developed interface to DBMSs (see the R Data Import/Export manual), enabling random access to data stored on disk. Also, see the 'Large memory and out-of-memory data' section in the High Performance Computing Task View (including the 'ff' package; thanks to Steve Lianoglou for pointing this out).

broadcasting rules

Broadcasting rules affect the behavior of binary operations ('+', '*', etc.) on arrays of different dimensions. Without broadcasting rules, the behavior of such operations may not be defined. Both R and NumPy arrays have broadcasting rules, but they are not the same rules.

index arrays

Index arrays may be used to index another array. For example:

> x <- array(rnorm(9), c(3, 3))
> y <- array(c(1, 1, 1, 2), c(2, 2))
> x[y]
[1] -0.9345381  0.5509239

However, the rules for index arrays are different for R and NumPy arrays.