Implementing R function in C++ Rcpp

Question

I have the following R code:

CutMatrix <- FullMatrix[, colSums( FullMatrix[-1,] != FullMatrix[-nrow( FullMatrix ), ] ) > 0]

Which takes a matrix - FullMatrix and makes a CutMatrix by finding which columns in the FullMatrix have columns with more than 1 unique value - so all columns with the same value are eliminated. I'm wondering if I can use Rcpp to speed this up for large matrices, but I'm unsure of the best way to do this - whether there is a sugarish way to easily do this (say by looping through the cols and counting the number of unique values) or if I would have to use something more complicated from the STL.

I thought maybe something like the following was a start (I haven't managed to get all the way) - trying to do the operation in between the colSums braces in the R function, but I don't think I'm sub-setting the matrix correctly since it doesen't work.

src <- '
//Convert the inputted character matrix of DNA sequences an Rcpp class.
Rcpp::CharacterMatrix mymatrix(inmatrix);

//Get the number of columns and rows in the matrix
int ncolumns = mymatrix.ncol();
int numrows = mymatrix.nrow();

//Get the dimension names
Rcpp::List dimnames = mymatrix.attr("dimnames");

Rcpp::CharacterMatrix vec1 = mymatrix(Range(1,numrows),_);
Rcpp::CharacterMatrix vec2 = mymatrix(Range(0,numrows-1),_); 
'

uniqueMatrix <- cxxfunction(signature(inmatrix="character"), src, plugin="Rcpp")

Thanks, Ben.

Have a look at the two Rcpp Gallery posts on Armadillo indexing. On the other hand, if you have character types, maybe you just need a simple loop (those are fast in C++) to compare rows to their preceding rows. You don't have to code this as a vectorised solution... — Dirk is no longer here
– Dirk is no longer here, Commented Nov 1, 2013 at 14:02

Simon O'Hanlon · Accepted Answer · 2013-11-01 14:18:03Z

This returns a LogicalVector which is FALSE for all those columns with only one unique value, which you can use to subset your R matrix.

require( Rcpp )
cppFunction('
  LogicalVector unq_mat( CharacterMatrix x ){

  int nc = x.ncol() ;
  LogicalVector out(nc);

  for( int i=0; i < nc; i++ ) {
    out[i] = unique( x(_,i) ).size() != 1 ;
    }
  return out;
}'
)

You can use it like this...

#  Generate toy data
set.seed(1)
mat <- matrix( as.character(c(rep(1,5),sample(3,15,repl=TRUE),rep(5,5))),5)
     [,1] [,2] [,3] [,4] [,5]
[1,] "1"  "1"  "3"  "1"  "5" 
[2,] "1"  "2"  "3"  "1"  "5" 
[3,] "1"  "2"  "2"  "3"  "5" 
[4,] "1"  "3"  "2"  "2"  "5" 
[5,] "1"  "1"  "1"  "3"  "5"

mat[ , unq_mat(mat) ]
     [,1] [,2] [,3]
[1,] "1"  "3"  "1" 
[2,] "2"  "3"  "1" 
[3,] "2"  "2"  "3" 
[4,] "3"  "2"  "2" 
[5,] "1"  "1"  "3"

Some basic benchmarking...

applyR <- function(y) { y[ , apply( y , 2 , function(x) length( unique(x) ) != 1L ) ] }
rcpp <- function(x) x[ , unq_mat(x) ]

require(microbenchmark)
microbenchmark( applyR(mat) , rcpp(mat) )
#Unit: microseconds
#        expr    min      lq median     uq    max neval
# applyR(mat) 131.94 134.737 136.31 139.29 268.07   100
#   rcpp(mat)   4.20   4.901   7.70   8.05  13.30   100

Collectives™ on Stack Overflow

Implementing R function in C++ Rcpp

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related