I am learning data.table using examples and I am stuck-up with my own scenario.
I am using cars dataset and converted to a data.table for trying my commands.
library(data.table)
> cars.dt=data.table(cars)
> cars.dt[1:5]
speed dist
1: 4 2
2: 4 10
3: 7 4
4: 7 22
5: 8 16
.
.
I wanted to calculate the summary statistics for each group of speed and store it in different columns but the values are stored in multiple rows.
e.g
> cars.dt[, summary(dist), by="speed"]
speed V1
1: 4 2
2: 4 4
3: 4 6
4: 4 6
5: 4 8
---
110: 25 85
111: 25 85
112: 25 85
113: 25 85
114: 25 85
I was expecting the below output and I am unable to achieve it.
speed Min. 1st Qu. Median Mean 3rd Qu. Max.
1: 4 2 4 6 6 8 10
2: 7 4.0 8.5 13.0 13.0 17.5 22.0
3: 8 16 16 16 16 16 16
4: 9 10 10 10 10 10 10
5: 10 18 22 26 26 30 34
6: 11 17.00 19.75 22.50 22.50 25.25 28.00
7: 12 14.0 18.5 22.0 21.5 25.0 28.0
8: 13 26 32 34 35 37 46
9: 14 26.0 33.5 48.0 50.5 65.0 80.0
10: 15 20.00 23.00 26.00 33.33 40.00 54.00
11: 16 32 34 36 36 38 40
12: 17 32.00 36.00 40.00 40.67 45.00 50.00
13: 18 42.0 52.5 66.0 64.5 78.0 84.0
14: 19 36 41 46 50 57 68
15: 20 32.0 48.0 52.0 50.4 56.0 64.0
16: 22 66 66 66 66 66 66
17: 23 54 54 54 54 54 54
18: 24 70.00 86.50 92.50 93.75 99.75 120.00
19: 25 85 85 85 85 85 85
I tried the below command but the output was not in a data.table
> cars.dt[, print(summary(dist)), by="speed"]
Min. 1st Qu. Median Mean 3rd Qu. Max.
2 4 6 6 8 10
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.0 8.5 13.0 13.0 17.5 22.0
...
Min. 1st Qu. Median Mean 3rd Qu. Max.
70.00 86.50 92.50 93.75 99.75 120.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
85 85 85 85 85 85
Empty data.table (0 rows) of 1 col: speed
I am unable to use functions returning multiple values when using by clause.
If anyone has any idea as to how to write this, it would be much appreciated.
Also let me know if this possible in data.table