2

I am trying to create a simple data frame that contains information about what authors and their respective papers. I have a matrix that contains the author IDs as the rows and the paper IDs as the columns. This matrix contains 1s and 0s, where a 1 indicates that the author worked on that paper. For example, if A2P[1,1] == 1, that means that the author with ID 1 worked on the paper with ID 1.

I am trying to convert this matrix into a simple data frame that contains all of these relationships, something that just contains the author IDs and the papers that they worked on. As in,

au_ID  P_ID
1      1
1      12        # Author 1 has worked on both paper 1 and 12
2      1         # Author 2 has also worked on paper 1, in addition to papers 2 and 3. 
2      2
2      3 
...

Here is what I am doing:

list1 <- list()
list2 <- list()
# Rows are Author IDs
# Columns are Paper IDs
for (row in 1:nrow(A2P)){
  for (col in 1:ncol(A2P)){
    if (A2P[row,col] == 1){
      list1 <- append(list1, row)
      list2 <- append(list2, col)
    }
  }
}
authorship["au_ID"] = list1
authorship["P_ID"] = list2

I am having difficulty getting this code to run quickly. It is taking forever to run, going on twenty minutes now. I think it has something to do with appending each row and column value to each of the lists, but I am unsure.

Any help would be greatly appreciated! Thank you so much!

1 Answer 1

3

You likely need which(A2P == 1L, arr.ind = TRUE)

mat <- matrix(c(1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L), ncol = 3)

mat
#     [,1] [,2] [,3]
#[1,]    1    0    1
#[2,]    0    1    0
#[3,]    0    1    1

which(mat == 1L, arr.ind = TRUE)
#     row col
#[1,]   1   1
#[2,]   2   2
#[3,]   3   2
#[4,]   1   3
#[5,]   3   3

In this case, row would correspond to au_ID and col would correspond to P_ID. Then to get it in your format completely:

authorship <- which(mat == 1L, arr.ind = TRUE)
colnames(authorship) <- c('au_ID', 'P_ID')

as.data.frame(authorship)
##  au_ID P_ID
##1     1    1
##2     2    2
##3     3    2
##4     1    3
##5     3    3
Sign up to request clarification or add additional context in comments.

3 Comments

Nice solution, love the simplicity
Thank you very much! My code actually did finish running, and I was able to create the data frame I wanted using the following: authorship <- do.call(rbind, Map(data.frame, au_ID=list1, P_ID=list2))
However, yours is so much more compact and does not use a loop. Thank you very much! I really appreciate it as I'm quite new to R - I was trying to use python syntax with the whole authorship["au_ID"] thing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.