4

I have the following R code:

CutMatrix <- FullMatrix[, colSums( FullMatrix[-1,] != FullMatrix[-nrow( FullMatrix ), ] ) > 0]

Which takes a matrix - FullMatrix and makes a CutMatrix by finding which columns in the FullMatrix have columns with more than 1 unique value - so all columns with the same value are eliminated. I'm wondering if I can use Rcpp to speed this up for large matrices, but I'm unsure of the best way to do this - whether there is a sugarish way to easily do this (say by looping through the cols and counting the number of unique values) or if I would have to use something more complicated from the STL.

I thought maybe something like the following was a start (I haven't managed to get all the way) - trying to do the operation in between the colSums braces in the R function, but I don't think I'm sub-setting the matrix correctly since it doesen't work.

src <- '
//Convert the inputted character matrix of DNA sequences an Rcpp class.
Rcpp::CharacterMatrix mymatrix(inmatrix);

//Get the number of columns and rows in the matrix
int ncolumns = mymatrix.ncol();
int numrows = mymatrix.nrow();

//Get the dimension names
Rcpp::List dimnames = mymatrix.attr("dimnames");

Rcpp::CharacterMatrix vec1 = mymatrix(Range(1,numrows),_);
Rcpp::CharacterMatrix vec2 = mymatrix(Range(0,numrows-1),_); 
'

uniqueMatrix <- cxxfunction(signature(inmatrix="character"), src, plugin="Rcpp")

Thanks, Ben.

1
  • Have a look at the two Rcpp Gallery posts on Armadillo indexing. On the other hand, if you have character types, maybe you just need a simple loop (those are fast in C++) to compare rows to their preceding rows. You don't have to code this as a vectorised solution... Commented Nov 1, 2013 at 14:02

1 Answer 1

2

This returns a LogicalVector which is FALSE for all those columns with only one unique value, which you can use to subset your R matrix.

require( Rcpp )
cppFunction('
  LogicalVector unq_mat( CharacterMatrix x ){

  int nc = x.ncol() ;
  LogicalVector out(nc);

  for( int i=0; i < nc; i++ ) {
    out[i] = unique( x(_,i) ).size() != 1 ;
    }
  return out;
}'
)

You can use it like this...

#  Generate toy data
set.seed(1)
mat <- matrix( as.character(c(rep(1,5),sample(3,15,repl=TRUE),rep(5,5))),5)
     [,1] [,2] [,3] [,4] [,5]
[1,] "1"  "1"  "3"  "1"  "5" 
[2,] "1"  "2"  "3"  "1"  "5" 
[3,] "1"  "2"  "2"  "3"  "5" 
[4,] "1"  "3"  "2"  "2"  "5" 
[5,] "1"  "1"  "1"  "3"  "5"

mat[ , unq_mat(mat) ]
     [,1] [,2] [,3]
[1,] "1"  "3"  "1" 
[2,] "2"  "3"  "1" 
[3,] "2"  "2"  "3" 
[4,] "3"  "2"  "2" 
[5,] "1"  "1"  "3" 

Some basic benchmarking...

applyR <- function(y) { y[ , apply( y , 2 , function(x) length( unique(x) ) != 1L ) ] }
rcpp <- function(x) x[ , unq_mat(x) ]

require(microbenchmark)
microbenchmark( applyR(mat) , rcpp(mat) )
#Unit: microseconds
#        expr    min      lq median     uq    max neval
# applyR(mat) 131.94 134.737 136.31 139.29 268.07   100
#   rcpp(mat)   4.20   4.901   7.70   8.05  13.30   100
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.