2

I'm trying to count the number of occurrences of multiple character IDs that occur in multiple rows of one column.

I have a vector of independent IDs

  x2<-sample(101:255,200,replace=TRUE)

And in a separate data frame I have column which consists of multiple character strings with some corresponding IDs, please forgive inability to create a full sample data set but the corresponding column in a data frame looks like this

 sample <- c("105 234 213", "147 211 174", "223 250 243", "189 190 153", "137 200 199")

 as.data.frame(sample)

The finished result would be a new data frame with an output like this

 ID    Occurrences 
 101      3
 102      15
 103      2
 104      26
 105      11
  ... etc

I hope this makes sense and any help would be great

2
  • Do the IDs come from x2 and the Occurrences from sample? Commented Mar 21, 2022 at 20:18
  • Yes the IDs come from x2, and the Occurrences from sample Commented Mar 21, 2022 at 20:44

2 Answers 2

1

In base R you can do:

df <- as.data.frame(sample)

setNames(as.data.frame(
  table(factor(unlist(strsplit(df$sample, " ")), unique(x2)))), 
  c("ID", "Occurences"))
#>      ID Occurences
#> 1   227          0
#> 2   213          1
#> 3   220          0
#> 4   143          0
#> 5   139          0
#> 6   216          0
#> 7   183          0
#> 8   107          0
#> 9   175          0
#> 10  148          0
#> 11  198          0
#> 12  131          0
#> 13  200          1
#> 14  110          0
#> 15  118          0
#> 16  126          0
#> 17  132          0
#> 18  162          0
#> 19  125          0
#> 20  195          0
#> 21  103          0
#> 22  237          0
#> 23  185          0
#> 24  142          0
#> 25  134          0
#> 26  235          0
#> 27  146          0
#> 28  225          0
#> 29  189          1
#> 30  208          0
#> 31  182          0
#> 32  124          0
#> 33  221          0
#> 34  101          0
#> 35  145          0
#> 36  127          0
#> 37  157          0
#> 38  212          0
#> 39  186          0
#> 40  119          0
#> 41  170          0
#> 42  116          0
#> 43  214          0
#> 44  247          0
#> 45  169          0
#> 46  205          0
#> 47  128          0
#> 48  230          0
#> 49  138          0
#> 50  209          0
#> 51  181          0
#> 52  191          0
#> 53  203          0
#> 54  254          0
#> 55  105          1
#> 56  133          0
#> 57  194          0
#> 58  238          0
#> 59  193          0
#> 60  201          0
#> 61  206          0
#> 62  117          0
#> 63  102          0
#> 64  172          0
#> 65  240          0
#> 66  155          0
#> 67  123          0
#> 68  197          0
#> 69  166          0
#> 70  141          0
#> 71  173          0
#> 72  222          0
#> 73  149          0
#> 74  144          0
#> 75  154          0
#> 76  202          0
#> 77  178          0
#> 78  210          0
#> 79  196          0
#> 80  187          0
#> 81  174          1
#> 82  188          0
#> 83  190          1
#> 84  130          0
#> 85  135          0
#> 86  140          0
#> 87  171          0
#> 88  217          0
#> 89  228          0
#> 90  249          0
#> 91  252          0
#> 92  165          0
#> 93  167          0
#> 94  180          0
#> 95  152          0
#> 96  156          0
#> 97  184          0
#> 98  243          1
#> 99  244          0
#> 100 242          0
#> 101 137          1
#> 102 104          0
#> 103 215          0
#> 104 255          0
#> 105 233          0
#> 106 246          0
#> 107 204          0
#> 108 112          0
#> 109 108          0
#> 110 218          0
#> 111 106          0
#> 112 163          0
#> 113 224          0
#> 114 239          0
#> 115 122          0
#> 116 251          0
#> 117 160          0
#> 118 231          0

Or, in pipe notation to make the logic easier to follow:

df$sample                           |>
  strsplit(" ")                     |>
  unlist()                          |>
  factor(levels = sort(unique(x2))) |>
  table()                           |>
  as.data.frame()                   |>
  setNames(c("ID", "Occurences"))

Created on 2022-03-21 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

Comments

1

Something like this?

library(dplyr)
library(tidyr)

as.data.frame(sample) %>% 
  separate_rows(sample) %>% 
  bind_rows(as_tibble(x2)) %>%
  type.convert(as.is = TRUE) %>% 
  mutate(ID = coalesce(sample,value)) %>% 
  count(ID, name = "Occurences")
ID Occurences
   <int>      <int>
 1   101          1
 2   102          2
 3   103          3
 4   104          1
 5   105          2
 6   106          3
 7   107          1
 8   109          1
 9   110          1
10   111          1
# ... with 111 more rows
> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.