4

I wished to create a following subset of the iris dataset using the Rcpp package:

head(subset(iris, Species == "versicolor"))

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
51          7.0         3.2          4.7         1.4 versicolor
52          6.4         3.2          4.5         1.5 versicolor
53          6.9         3.1          4.9         1.5 versicolor
54          5.5         2.3          4.0         1.3 versicolor
55          6.5         2.8          4.6         1.5 versicolor
56          5.7         2.8          4.5         1.3 versicolor

I know how to subset columns of Rcpp::DataFrame - there is an overloaded operator [ which works as in R: x["var"]. However, I cannot find any way that would allow me to subset rows of a DataFrame with a not fixed number of columns.

I would like to write a function subset_rows_rcpp_iris which takes Rcpp::DataFrame (which will always be iris) and a CharacterVector level_of_species as inputs. It will return DataFrame object.

DataFrame subset_rows_rcpp_iris(DataFrame x, CharacterVector level_of_species) {
    ...
}

First, I want to find indices of rows that satisfy logical query. My problem is that if I access the Species vector in test function, save it as a CharacterVector and then compare it with level_of_species I get always only one TRUE value in case of setosa and FALSE values in other cases.

cppFunction('
    LogicalVector test(DataFrame x, CharacterVector level_of_species) {
            CharacterVector sub = x["Species"];
            LogicalVector ind = sub == level_of_species;
            return(ind);
            }
')
head(test(iris, "setosa"))

[1]  TRUE FALSE FALSE FALSE FALSE FALSE

If this worked, I could rewrite test function and use the vector with TRUE/FALSE values to subset each of the column of the data frame separately and then combine them again with Rcpp::DataFrame::create.

3
  • Yes, indeed. However, I don't know how can I represent a character scalar in C++. There is no such a class as CharacterScalar in Rcpp. String doesn't work either. Commented Nov 19, 2016 at 11:06
  • 1
    right right!! my mistake..since there was NumericScalar i thought the same here....i think we are assuming recylcing of R in C++ here when doing sub==level_of_species Commented Nov 19, 2016 at 11:09
  • we need to have a for loop Commented Nov 19, 2016 at 11:09

1 Answer 1

4
cppFunction('LogicalVector test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
      ind[i] = (sub[i] == level);
  }

  return(ind);
}')

xx=test(iris, "setosa")
> table(xx)
 xx
 FALSE  TRUE 
   100    50 

Subsetting done!!! (i myself learnt a lot from this question..thanks!)

cppFunction('Rcpp::DataFrame test(DataFrame x, StringVector level_of_species) {
  using namespace std;  
  StringVector sub = x["Species"];
  std::string level = Rcpp::as<std::string>(level_of_species[0]);
  Rcpp::LogicalVector ind(sub.size());
  for (int i = 0; i < sub.size(); i++){
    ind[i] = (sub[i] == level);
  }

 // extracting each column into a vector
 Rcpp::NumericVector   SepalLength = x["Sepal.Length"];
 Rcpp::NumericVector   SepalWidth = x["Sepal.Width"];
 Rcpp::NumericVector PetalLength = x["Petal.Length"];
 Rcpp::NumericVector   PetalWidth = x["Petal.Width"];


 return Rcpp::DataFrame::create(Rcpp::Named("Sepal.Length")  = SepalLength[ind],
                                Rcpp::Named("Sepal.Width")  = SepalWidth[ind],
                                Rcpp::Named("Petal.Length")  = PetalLength[ind],
                                Rcpp::Named("Petal.Width")  = PetalWidth[ind]
);}')

yy=test(iris, "setosa")
> str(yy)
 'data.frame':  50 obs. of  4 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.