Category Archives: Technical

Comments on an R Connections API

I wrote this post months ago but never hit 'Publish'. But, the subject has changed little since then. So, here's to cleaning out the draft folder...

R's connections are the heart of data/code/text input and output. Without connections, R would be crippled. Additional connections make R more ... connected with potential data sources and output sinks. It makes sense to build tools that safely create and manipulate connections.

These tools might be a collection of guidelines, for example, how to pass arguments to the connection generics seek, truncate, etc.; and adding read and write generics (by converting the existing write to default generic functionality). Alternatively, the tools might consist of a full blown connection API, perhaps at the R and C/C++ levels.

If there were support and consensus, I would commit additional effort to help implement and maintain a complete connections API for R. I believe others would also contribute, since there is documented interest. BTW, my post on serial connections has been the all-time most popular.

Indeed, some work towards a connections API already exists. Consider Jeff Horner's connections API proposal. I've also briefly considered a reimplementation of the connections internals, though the idea is quite rough. It's simply not worth pursuing without further consensus.

Some reasons to work towards a connections API

  • Provide a standard mechanism to interface R with arbitrary data streams:
    • hardware interfaces (TTY, serial, USB, etc.)
    • novel software interfaces (binary files à la gzcon)
    • rendered graphics output
  • Less maintenance for R core. A connections API might be used to further modularize the core of R by pushing some connections (e.g. clipboard) into packages. The argument for modularized code (i.e. the internet module) applies here. That is, it's wasteful to link against rarely used connection code. If this argument is valid anywhere, it is valid with the current connections internals, as connections.c is the largest code file in src/main/. This might also permit non-R-core members to help maintain the connection code.
  • Permit the R community to write new connections, taking advantage of the existing generic connections functionality:
    • R level read/write
    • binary manipulation / conversion
    • character re-encoding
    • sinks
  • Enable graphics output to connections. R. A recent R-devel conversation brought to light that R graphics devices do not currently interact with R connections, but rather write their data directly to OS level streams. This prospect is particularly attractive for web applications of R (which are steadily gaining momentum).

I'd be happy to add to this list, should anyone have other ideas.