The working data looks like:
df <- data.table(Name = c("a","a","b","b","b","c","c"),
SPP = c("YP","YP","YP","BY","BY","CY","YP"),
Con = sample(1:20,7))
df
Name SPP Con
1: a YP 18
2: a YP 4
3: b YP 2
4: b BY 15
5: b BY 17
6: c CY 1
7: c YP 20
The goal is to summarize information in SPP grouped by Name. The ideal output should looks like:
Name SPP N V1
1: a YP 2 1
2: b YP 1 2
3: b BY 2 2
4: c CY 1 2
5: c YP 1 2
Where N is the number of observations for each SPP in each Name group. V1 is the total number of SPP type in each Name group. For example, in the above summary table, row 2 and row3 shows that: b(Name) has 1 YP and 2 BY (SPP). The total SPP type in b is 2(V1).
I can generate the summary table by:
m1 <- df[, .(.N), by = .(Name, SPP)]
m2 <- df[,.(length(unique(SPP))), by = Name]
merge(m1,m2,by = c("Name"))
The question is whether I can generate this summary table using more concise data.table command(s) without using merge two tables? I tried something like:
m1 <- df[, .(.N, length(unique(SPP))), by = .(Name, SPP)]
It does not work well as wanted. I don't know why. Could someone help me to explain about this? Thank you!