R to C Compiler

RCC Home | Download RCC | Publications | Status | Links | Project Contacts


Last Updated 5/15/2010

R is a publicly available implementation of the high-level S language for statistical computing. (S-PLUS is a well known commercial statistical environment also based on the S language.) The S language is widely used for statistical calculations, particularly in biology and medicine.

The S language is not widely regarded as a platform for developing scalable, high-performance codes. In both the R and S-PLUS environments, S programs are interpreted. Moreover, execution of S programs typically involves dynamic allocation of large data structures, particularly arrays.

However, we believe that an execution environment based on an advanced optimizing compiler will be able to execute S programs an order-of-magnitude faster than a naive interpreter. In collaboration with biostatistical researchers from the M. D. Anderson Cancer Center, we have conducted a study of a number of applications written in S. These applications include many that employ calls to standard toolbox routines written by the M. D. Anderson researchers. The S programs we examined also included the use of many standard programming idioms. Our study suggests that optimized compilation of these programs will improve the performance of S programs by a factor of between 10 to 100.

We are working to create an open-source, portable, retargetable, high-quality R compiler suitable for use with production codes.

Compiler Architecture:

The compiler system we are building has three phases:

  1. Analysis of programs and libraries written in R. Currently, we are investigating static analysis techniques for call graph construction and dataflow analysis of R. R's combination of function variables, lexical scoping, and assignment make precise dataflow analysis hard.
  2. Translation of R programs into C.Initially, this translation process was naive, and simply rewrote R programs to make calls to the interpreter's runtime support libraries. We have begun to exploit results from static analysis of R to avoid dynamic lookup of function variable bindings and reduce overheads associated with garbage-collected storage management. This enables us to generate C programs that rely on the run-time libraries less and perform operations more directly and efficiently.
  3. Analysis and source-to-source optimization of C programs. This phase involves analysis and optimization of R programs translated into C with the run-time support libraries as well. Goals of this effort include replacing garbage-collected storage management with region-based storage management along with domain-specific optimization of library based programs.

Current Status:

A version of the RCC compiler is in place; it uses static analysis to improve performance for most R programs. RCC will be made publicly available when it is ready for public use.

Downloads:

    Check here soon for an updated version of RCC with bug fixes and improved optimization.

Publications:

Links:

Project Contacts:

External Collaborator:

Acknowledgements:

This work was supported in part by a RICE CITI Innovation Grant, and the NPACI.


Hipersoft | LACSI | GrADS | Compaq | NCSA | NPACI © 2004 Rice University