1

I have a data frame that I'd like to order based on a vector of IDs and on the all the columns of another data frame.

id.namestest = data.frame(test = NA, id= c("id1", "id2", "id3","id3", "id2", "id1"))

head(admix)
#             V1        V2           V3
# [1,] 0.1019623 0.8961855 1.852222e-03
# [2,] 0.6891593 0.3107807 5.999776e-05
# [3,] 0.7274040 0.2697308 2.865165e-03
# [4,] 0.3458368 0.6514100 2.753215e-03
# [5,] 0.3946996 0.6053004 1.000000e-09
# [6,] 0.6383386 0.3585409 3.120463e-03

admix=structure(c(0.101962262250848, 0.68915927427333, 0.727404046114676, 
            0.345836796905855, 0.394699646563406, 0.638338623952938, 0.896185515801946, 
            0.310780727965854, 0.26973078933548, 0.65140998802539, 0.605300352436594, 
            0.358540912890725, 0.00185222194720621, 5.99977608165462e-05, 
            0.00286516454984352, 0.00275321506875506, 1e-09, 0.00312046315633649
), dim = c(6L, 3L), dimnames = list(NULL, c("V1", "V2", "V3")))

This below works, but I have to manually set the column order in admix:

admix.tmp = cbind(admix, id.namestest)
if (K==3) { admix.sort.tmp = admix.tmp[order(id.namestest[,2], admix[,1],admix[,2],admix[,3]),]}

I'd like to instead provide a vector of the order of columns sort.order

sort.order = c(1,2,3)

admix.sort.tmp = admix.tmp[order(id.namestest[,2], admix[,sort.order]),]

But I get this:

Error in order(id.namestest[, 2], admix[, c(1, 2, 3)]) : 
  argument lengths differ

I also tried:

admix.sort.tmp = admix.tmp[order(id.namestest[,2], asplit(admix, 2)),]

but I get the same error.

1
  • Sorry, it's admix.tmp = cbind(admix, id.namestest) Commented Dec 6, 2022 at 17:42

1 Answer 1

1

As showed in the error, the id.namestest[,2] is a vector with length 5, whereas the admix[, 1, 2, 3] is a matrix and its length will the length of the number of elements in the matrix. We can create a list and then use order with do.call

admix.tmp[do.call(order, c(list(id.namestest[,2]), asplit(admix, 2))),]

-output

         V1        V2           V3 test  id
1 0.1019623 0.8961855 1.852222e-03   NA id1
6 0.6383386 0.3585409 3.120463e-03   NA id1
5 0.3946996 0.6053004 1.000000e-09   NA id2
2 0.6891593 0.3107807 5.999776e-05   NA id2
4 0.3458368 0.6514100 2.753215e-03   NA id3
3 0.7274040 0.2697308 2.865165e-03   NA id3

By creating a list of vectors or a data.frame, the types of columns are intact

admix.tmp[do.call(order, cbind(id.namestest[2], admix)),]
         V1        V2           V3 test  id
1 0.1019623 0.8961855 1.852222e-03   NA id1
6 0.6383386 0.3585409 3.120463e-03   NA id1
5 0.3946996 0.6053004 1.000000e-09   NA id2
2 0.6891593 0.3107807 5.999776e-05   NA id2
4 0.3458368 0.6514100 2.753215e-03   NA id3
3 0.7274040 0.2697308 2.865165e-03   NA id3

Or using dplyr

library(dplyr)
admix.tmp %>%
   arrange(id, across(all_of(colnames(admix[, sort.order, drop = FALSE]))))
Sign up to request clarification or add additional context in comments.

4 Comments

Or just admix.tmp[order(admix.tmp$id), ].
Yes, I forgot. The example left the columns in the original order. The columns could be arranged within cbind(): cbind(admix[, c(3, 1, 2)], id.namestest)[order(id.namestest$id), ].
@dcarlson Based on the OP's working code, admix.tmp[order(id.namestest[,2], admix[,1],admix[,2],admix[,3]),] it seems to be ordering by each column
Yes, I want to sort by each column by select which column will be sorted in which order. Hence the sort.order

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.