R tips

Set fonts size in R plot

posted Apr 22, 2011 3:23 PM by Feng Li   [ updated Apr 22, 2011 3:40 PM ]

You probably have discovered that R usually gives you different size of fonts in different figures. That is because a numerical value (cex in par function) giving the amount by which plotting text and symbols is magnified relative to a default value. This starts as ‘1’ when a device is opened, and is reset when the layout is changed, e.g. by setting ‘mfrow’.

But the final fonts size for the main title (or axis, lab, sub) is determined by the product of the three variables, ps, cex, and cex.main (or cex.axis, cex.lab, cex.sub), respectively.

So if you want 12 points fonts in the title, you may set the following options

par(ps = 12, cex = 1, cex.main = 1)

Add comments on your R objects

posted Nov 18, 2010 11:56 AM by Feng Li   [ updated Jan 7, 2011 7:49 AM ]

It is always good to add comments on your R code. But it is also good to add comments on your objects like variables, functions, etc.

> Mydata <- seq(1:10)
> comment(Mydata) <- "This is a data from a sequence"
> Mydata
 [1]  1  2  3  4  5  6  7  8  9 10
> comment(Mydata)
[1] "This is a data from a sequence"
> str(Mydata)
 atomic [1:10] 1 2 3 4 5 6 7 8 9 10
 - attr(*, "comment")= chr "This is a data from a sequence"

Command line clear R console screen

posted Nov 18, 2010 11:49 AM by Feng Li   [ updated Nov 18, 2010 11:56 AM ]

Under Linux console, you can always clear the R screen by command line

cat("\033[2J\033[;H")

Note: This is not applicable to Rgui for Windows.

Source a bunch of files under given folders and/or subfolders

posted Oct 6, 2010 8:37 AM by Feng Li   [ updated May 22, 2011 3:55 PM ]

Update:  April 16, 2011 New style. Also support byte code compiling and loading.


I wrote a function sourceDir() that can source all the functions under given folders and/or subfolders. This function is very much like "Set Path" in Matlab.

##' Sourcing a bunch of files located from different folders/subfolders.
##'
##' Also provide byte-code compiler before sourcing.
##' @param ... Paths to be sourced
##' @param byte.compile "logical". If TRUE, byte compile the R code first and load. Else,
##'        only load R code.
##' @param recursive "logical". If TRUE, files will be sourced recursively for all
##'                  sub folders.
##' @param silent "logical". If TRUE, No detailed information printed out.
##' @param ignore.error "logical". If TRUE, try to continue even errors occur.
##' @return NULL
##' @author Feng Li, Department of Statistics, Stockholm University, Sweden.
##' @license  GPL(>=2)
##' @note First version: Wed Apr 15 20:00:43  CET 2009;
##'       Current:       Sat Apr 16 16:02:17 CEST 2011.
##' TODO: add environmental option, allow to only source byte code.
sourceDir <- function(..., byte.compile = TRUE, recursive = TRUE, silent = FALSE,
                      ignore.error = FALSE)
{
  ## call the path
  Paths <- c(...)
  if(length(Paths) == 0) # if path is not given,  use current working directory
    {
      Paths <- getwd()
    }
  ## Take away the extra separator "\" or "/"
  Paths <- gsub(paste("\\",.Platform$file.sep,"$",sep=""),"",Paths)
  ## Check if byte compile is requested and supported.
  if(byte.compile == TRUE)
    {
      if(version$minor >= 13)
        {
          require("compiler") # require the compiler library
        }
      else
        {
          byte.compile == FALSE
          warning("Byte code compiling not support on this version!")
        }
    }
  n <- 0
  error <- 0
  if(silent == FALSE)
    {
      cat("Loading files  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n")
    }
  for(i in 1:length(Paths)) # loop over different paths
    {
      subfiles <- list.files(Paths[i], pattern = "\\.[Rr]$", recursive = recursive) # list
                                        # all subfiles for given pattern at Paths[i]
      if(length(subfiles) == 0) # unknown folder
        {
          warning(paste("\"", Paths[i], "\": no such directory, skipped.", sep = ""))
        }
      else
        {
          if(silent == FALSE)
            {
              cat("\n\"", Paths[i], "/\":...\n", sep = "")
            }
        }
      for (nm in subfiles) ##
        {
          n <- n + 1
          path.nm <- file.path(Paths[i], nm)
          if(byte.compile == TRUE)
            {
              compile.try <- try(cmpfile(path.nm), silent = TRUE)
              path2source.nm <- paste(gsub("\\.[Rr]$", "", path.nm), ".Rc", sep = "")
              if(methods::is(compile.try, "try-error") || file.exists(path2source.nm) == FALSE) # non-function file can not be compiled.
                {
                  source(path.nm) # try to use source() directly
                }
              else # load compiled file
                {
                  loadcmp(path2source.nm)
                }
            }
          else # the usual source procedure.
            {             
              if(ignore.error == FALSE) # stop at error
                {
                  if(silent == FALSE) # report results
                    {
                      source(path.nm)
                      cat(format(c(paste("...\"", nm, "\"", sep = ""), "[ok]"), width =
                                 65), "\n")
                    }
                  else # silent if no error
                    {
                      source(path.nm)
                    }
                }
              else # skip error files
                {
                  try.source <- try(source(path.nm), silent = TRUE)
                  if(silent == FALSE) # report results
                    {
                      if(methods::is(try.source, "try-error"))
                        {
                          cat(format(c(paste("...\"", nm, "\"", sep = ""), "[??]"), width
                                     = 65), "\n")
                          error <- error + 1
                        }
                      else
                        {
                          cat(format(c(paste("...\"", nm, "\"", spe = ""), "[ok]"), width
                                     = 65), "\n")      
                        }
                    }
                  else # silent
                    {
                      if(methods::is(try.source, "try-error"))
                        {
                          error <- error + 1
                        }
                    }
                }
            }
        }
 }    
  ## Summary on quit
  if(silent  == FALSE)
    {
      cat(format(paste("[Succeeded files: ",  n - error, ",", " skipped due to error: ",
                       error, ".]\n\n", sep = ""), justify = "centre", width = 65), "\n")
    }
}




To use it more efficiently, I suggest you put the function at some place, e.g. ~/workspace/R_utils/sourceDir.R, and write the following lines in .Rprofile file.

.my.env <- new.env()
sys.source("~/workspace/R_utils/sourceDir.R", envir=.my.env)
attach(.my.env)


Next time the function will be automatically loaded under a personal environment .my.env when R is launched and you can use it directly. The advantage of this is that rm(list=ls()) command will not remove the function sourceDir. But if you really want to remove sourceDir function, you can use rm(list=ls(all=TRUE)) command instead.

R syntax highlighting in Gedit

posted Oct 6, 2010 8:35 AM by Feng Li   [ updated Oct 6, 2010 8:37 AM ]

It is quite annoying that you have to manually select Highlight mode for R source code to enable syntax highlight in Gedit. You simply try this to let Gedit do it automatically for you.    

Open the file ~/.local/share/mime/packages/Overrides.xml or create one if not exist. Copy the following code

 <mime-type type="text/x-R">
   <comment>R code</comment>
   <glob pattern="*.Rhistory"/>
   <glob pattern="*.R"/>
   <glob pattern="*.r"/>
   <glob pattern="*.R~"/>
   <glob pattern="*.r~"/>
   <glob pattern="*.Rprofile"/>       
  </mime-type>

and paste in the middle of the following blocks (also create it if this block of code does not exist).

<?xml version='1.0' encoding='utf-8'?>
<mime-info xmlns="http://www.freedesktop.org/standards/shared-mime-info">



</mime-info>

Now you can open a terminal and update the mime library.

update-mime-database ~/.local/share/mime/

Note: This hacking does not require root privilege. Someone at the R mail list suggested to edit /usr/share/mime/packages/freedesktop.org.xml which I don't recommend. The R syntax highlight is controlled by the R.lang file at /usr/share/gtksourceview-2.0/language-specs/. You may add your own keywords also.

Bye the way, you can also use this file to over ride the program's default mime type associations. For example. I have Mathematica installed and it linked all the *.m files with Mathematica Notebook format by default. But actually those files are my Matlab scripts. The Nautilus will give me a big warning when I double click to open it. You can eventually emphasize that "This is my Matlab file and leave me alone, Mathematica!" by placing the code to the Overrides.xml file. Don't forget to update the mime library again.

  <mime-type type="text/x-matlab">
   <comment>MATLAB script/function</comment>
   <glob pattern="*.m"/>
  </mime-type>

For more information, check out the gnome system admin guide at http://library.gnome.org/admin/system-admin-guide/stable/mimetypes-modifying.html.en .

How to shrink eps files margins?

posted Oct 6, 2010 8:34 AM by Feng Li   [ updated Feb 11, 2011 10:08 AM ]

When you create a figure (in e.g. eps format) with R, the margins around the main context are always to wide. To save space in the final documents, e.g. LaTeX generated pdf file. I have figured out two ways of reducing the margins.
  • Method A (R users) Before you make your graph in R, use par(mar=c(bottom, left, top, right)) to specify the margin you want to keep. The default value is c(5, 4, 4, 2) + 0.1. Try this example to see the differences.
    par(mar=c(5,4,4,2)+0.1) # The defualt margins
    plot(rnorm(100))
    dev.copy2eps()          # Save as eps

    par(mar=c(4,4,0,0)+0.1) # Figure with very tight margins
    plot(rnorm(100))
    dev.copy2eps()
  • Method B (use epstool) Very handy tool that can handle the optimal bounding box
  • epstool --copy --bbox file.eps file_new.eps


  • Method C (use ps2epsi) It automatically calculates the bounding box required for all encapsulated PostScript files, so most of the time it does a pretty good job
    ps2epsi <input.eps> <output.eps>

  • Method D (DIY for any eps ) Use a text editor open your eps file and you will find a line like this
    %%BoundingBox: 0 0 503 503
    in the front lines of the file. Adjust these vales to proper integers. Save it and test if the margins are better.

How to prevent dropping dimensions in a matrix/array?

posted Oct 6, 2010 8:32 AM by Feng Li   [ updated Apr 29, 2011 2:17 AM ]

When you create a matrix in the usual way like this,
> a <- matrix(rnorm(10),2,5)
> a
           [,1]      [,2]       [,3]       [,4]       [,5]
[1,]  1.3488918 0.6225795 -0.7444514  1.3130491  1.7877849
[2,] -0.2385392 0.5656759  0.9037435 -0.2217444 -0.2656875

the dimension dropped after picking up a single row or column in this way,
> b <- a[,1]
> b
[1]  1.3488918 -0.2385392
> dim(b)
NULL

The solution is to try it with a parameter drop = FALSE,
> b <- a[,1,drop = FALSE]
> b
           [,1]
[1,]  1.3488918
[2,] -0.2385392
> dim(b)
[1] 2 1

Common R traps

posted Oct 6, 2010 8:30 AM by Feng Li   [ updated Nov 17, 2010 3:15 AM ]

 I will show some R traps here.  The purpose of this page is not going to tell you how "crappy" R is. R is great indeed and also these kind of "traps" can happen in any other languages.

if(a<-5)

Assume you want make a condition to check if "a is smaller than negative five", then do something. So you wrote
if (a<-5)
{
    sin(pi/3)
}
but R will check if you "assign positive five to a"  since "<-" in R is an assignment operator. And of course this is always TRUE. As a result, R will always do the calculations within the condition.

Solutions
: use a better coding style. i.e. always put a space between the operator (either assignment operators or relations operators) and values e.g.,
if (a < -5)
{
    sin(pi/3)
}

or use the parentheses if you want to make sure what you are ding.
if (a<(-5))
{
    sin(pi/3)
}

break a long line

When you want to break a long expression into several lines in R, you don't have to put a special notation at end of each line and R will check if your expression has finished. This makes thing convenient but also brings troubles.  Assume you have a very long expression and you want to break it into two lines, e.g.
myvalue <- sin(pi/3) + cos(pi/3) + 2*sin(pi/3)*cos(pi/3)
The result should be 2.232051.

But you wrote
myvalue <- sin(pi/3) + cos(pi/3)
           + 2*sin(pi/3)*cos(pi/3)
R will think you have finished the expression at the end of first line and started a new expression from the second line.  You will find the result is 1.366025 since the second part is not included in at all.

Solutions: You can either put a pair of parentheses in your expression like this
myvalue <- (sin(pi/3) + cos(pi/3)
            + 2*sin(pi/3)*cos(pi/3))
but too many parentheses make the code very hard to read. So you can do the trick that alway break the line after the arithmetic operators
myvalue <- sin(pi/3) + cos(pi/3) +
           2*sin(pi/3)*cos(pi/3)

diag() function with a vector

As is described in R help document, using ‘diag(x)’ can have unexpected effects if ‘x’ is a vector could be of length one, like this example

> diag(7.4)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    1    0    0    0    0    0    0
[2,]    0    1    0    0    0    0    0
[3,]    0    0    1    0    0    0    0
[4,]    0    0    0    1    0    0    0
[5,]    0    0    0    0    1    0    0
[6,]    0    0    0    0    0    1    0
[7,]    0    0    0    0    0    0    1

 Solutions: To avoid this, use "diag(x, nrow = length(x))" for consistent behavior when "x" is a vector
 
> x = c(1,2,3)
> diag(x,length(x))
     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

> x = 2.4
> diag(x,length(x))
     [,1]
[1,]  2.4





A fast routine of making a commutation matrix

posted Oct 6, 2010 8:28 AM by Feng Li   [ updated May 25, 2011 6:32 AM ]

Given a m--by--n matrix X, there is a unique mn--by-mn matrix K satisfies KvecX = vec(X') where K is called commutation matrix.

Create a commutation matrix

R does not have this function by default, so I wrote a faster routine K() to create a commutation matrix.

## K:
##   Make a commutation matrix

## Description:
##   A fast way to make a commutation matrix.
  
## Usage:
##   K(m, n)

## Arguments:
##   m:       "Integer". 
##   n:       "Integer".

## Details:
##   Both 'm' and 'n' shoud be positive integers
 
## Value:
##   A matrix with dimension of "mn--by--mn".

## Author:
##   Feng Li <Feng.Li@stat.su.se>, Dept. of Statistics, Stockholm University, Sweden.

## License: GPL(>=2)

## Version:
##   First:       Tue Mar  9 16:00:06 CET 2010
##   Current:     Tue Mar  9 16:00:15 CET 2010

K <- function(m,n)
{
  x <- matrix(0,m*n,m*n)
  m0 <- 1:(m*n)
  n0 <- as.vector(t(matrix(m0,m,n)))
  for (i in 1:(m*n))
    {
      x[m0[i],n0[i]] <- 1
    }
  return(x)
}

Here is a rough test of the performance. 

## Test if it is correctly designed
> X <- matrix(rnorm(30*50),30,50)

> all(K(30,50)%*%as.vector(X) == as.vector(t(X)))
[1] TRUE

## Test the speed
> system.time(K(30,50))   # 1500--by--1500 matrix
   user  system elapsed
  0.012   0.000   0.015

> system.time(K(100,100)) # 10,000--by--10,000 matrix
   user  system elapsed
  0.284   0.376   0.660

There is also a R function commutation() in the package MCMCglmm that does the same function but it is generally impossible to use due to the slow speed. 

> require("MCMCglmm")
> system.time(commutation(30,50))
   user  system elapsed
 96.642   0.040  96.729

> all(K(30,50)==commutation(30,50))
[1] TRUE

Commutation matrix multiplication

We always use the notation of commutation matrix on the paper. But when we put it into practice, we typically need the result of a dense matrix by pre- or post-multiplied by a commutation matrix. i.e., KX or YK., where K is the commutation matrix and X, Y are dense matrices. This could be a disaster when the dimensions are big in the usual way of multiplication. So a very easy and efficient way of doing this can be like this

## K.X:
##   Multiplication of a commutation matrix (K) with a dense matrix (X)

## Description:
##   A fast way to make a multiplication of a commutation matrix (K)
##   with a dense matrix (X)
   
## Usage:
##   K.X(m, n, X, t)

## Arguments:
##   m:       "Integer". Parameter of the commutation matrix "K"  
##   n:       "Integer". Parameter of the commutation matrix "K"
##   X:       "Matrix".  A matrix to be multiplied.
##   t:       "Logical". If FALSE, do K%*%X; else, do X%*%K.

## Details:
##   Both 'm' and 'n' should be positive integers. The dimensions should be comfortable
##   As usual matrices multiplication.
 
## Value:
##   A matrix with dimension depends on "K" and "X".

## Author:
##   Feng Li <Feng.Li@stat.su.se>, Dept. of Statistics, Stockholm University, Sweden.

## License: GPL(>=2)

## Version:
##   First:       Wed Mar 10 14:03:31 CET 2010
##   Current:     Wed Mar 10 14:03:39 CET 2010

K.X <- function(m,n,X,t)
{
  if(t==FALSE)
   {
     if(dim(X)[1]!=(m*n))
       {stop("The dimensions of (K %*% X) are not conformable.")}
     key <- as.vector(t(matrix(c(1:(m*n)),m,n)))
     X <- X[key, , drop = FALSE]
    }
  else
    {
      if(dim(X)[2]!=(m*n))
       {stop("The dimensions of (X %*% K) are not conformable")}
      key <- as.vector(t(matrix(c(1:(m*n)),n,m)))
      X<- X[ ,key, drop = FALSE]
    }
  return(X)
}

And here is a test of the speed 

> X<-matrix(rnorm(5e6),1000)
> all(K.X(500,10,X,T)==X%*%K(500,10))
[1] TRUE

> X<-matrix(rnorm(5e5),10000)
> all(K.X(100,100,X,F)==K(100,100)%*%X)
[1] TRUE

> system.time(K(100,100)%*%X)
   user  system elapsed
  7.593   0.284   7.876
> system.time(K.X(100,100,X,F))
   user  system elapsed
  0.008   0.000   0.008

1-9 of 9