Using data.table to make summary table

Question

The working data looks like:

df <- data.table(Name = c("a","a","b","b","b","c","c"),
             SPP = c("YP","YP","YP","BY","BY","CY","YP"),
             Con = sample(1:20,7))
df
   Name SPP Con
1:    a  YP  18
2:    a  YP   4
3:    b  YP   2
4:    b  BY  15
5:    b  BY  17
6:    c  CY   1
7:    c  YP  20

The goal is to summarize information in SPP grouped by Name. The ideal output should looks like:

   Name SPP N V1
1:    a  YP 2  1
2:    b  YP 1  2
3:    b  BY 2  2
4:    c  CY 1  2
5:    c  YP 1  2

Where N is the number of observations for each SPP in each Name group. V1 is the total number of SPP type in each Name group. For example, in the above summary table, row 2 and row3 shows that: b(Name) has 1 YP and 2 BY (SPP). The total SPP type in b is 2(V1).

I can generate the summary table by:

m1 <- df[, .(.N), by = .(Name, SPP)]
m2 <- df[,.(length(unique(SPP))), by = Name]
merge(m1,m2,by = c("Name"))

The question is whether I can generate this summary table using more concise data.table command(s) without using merge two tables? I tried something like:

m1 <- df[, .(.N, length(unique(SPP))), by = .(Name, SPP)]

It does not work well as wanted. I don't know why. Could someone help me to explain about this? Thank you!

eddi · Accepted Answer · 2016-04-18 22:05:06Z

4

This works, but is too convoluted in my opinion, with nested aggregation:

df[, c(.SD[, .N, by=SPP], n_SPP = uniqueN(SPP)), by=Name]
# or 
df[, {z = .SD[, .N, by=SPP]; c(z, n_SPP = nrow(z))}, by=Name]

Another option would be sequential aggregation:

df[, .N, by=.(Name, SPP)][, n_SPP := .N, by=Name][]

edited Apr 18, 2016 at 22:05

eddi

49.5k6 gold badges109 silver badges157 bronze badges

answered Apr 18, 2016 at 21:01

Frank

66.9k8 gold badges104 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Chuan Over a year ago

It definitely works well with chaining! I just curious whether I can find a single aggregation approach for further usage. I will wait a little bit to see anyone can provide other thoughts, if not, your answer will get my vote! Thanks!

Frank Over a year ago

Yeah, feel free to leave it open as long as you want. I'm also curious to see better approaches.

jangorecki Over a year ago

I've asked similar question to Matt in 2013. Just found the answer in old emails and it is much like your answer here :)

Collectives™ on Stack Overflow

Using data.table to make summary table

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related