I have a data frame that I would like to get a summary of every column from and I would like this output in a table format that is presentable. I tried
summary_table <- as.data.frame(summary(mydata))
but it did not work. Any help?
I don't know that I would consider this to be a "presentable" format, but you could always unclass the result of summary if you really insist on a data.frame in a "wide" form.
data.frame(unclass(summary(airquality)))
## X....Ozone X...Solar.R X.....Wind X.....Temp X....Month X.....Day
## 1 Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.000 Min. : 1.0
## 2 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.000 1st Qu.: 8.0
## 3 Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.000 Median :16.0
## 4 Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.993 Mean :15.8
## 5 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.000 3rd Qu.:23.0
## 6 Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.000 Max. :31.0
## 7 NA's :37 NA's :7 <NA> <NA> <NA> <NA>
I find that output to include a lot of redundant information, however.
I suppose you could also consider a function like the following:
summaryDF <- function(indf) {
require(splitstackshape)
temp <- data.table(summary(indf))[, c("V2", "N"), with = FALSE]
dcast.data.table(cSplit(temp, "N", ":")[!is.na(N_1)],
N_1 ~ V2, value.var = "N_2")
}
summaryDF(airquality)
## N_1 Day Temp Wind Month Ozone Solar.R
## 1: 1st Qu. 8.0 72.00 7.400 6.000 18.00 115.8
## 2: 3rd Qu. 23.0 85.00 11.500 8.000 63.25 258.8
## 3: Max. 31.0 97.00 20.700 9.000 168.00 334.0
## 4: Mean 15.8 77.88 9.958 6.993 42.13 185.9
## 5: Median 16.0 79.00 9.700 7.000 31.50 205.0
## 6: Min. 1.0 56.00 1.700 5.000 1.00 7.0
## 7: NA's NA NA NA NA 37.00 7.0
Don't expect miracles on datasets with different types of columns though. For example, summaryDF(iris) wouldn't be meaningful.
Also, if you don't have any NA values in your dataset, you may be able to just get away with sapply:
sapply(mtcars, summary)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Min. 10.40 4.000 71.1 52.0 2.760 1.513 14.50 0.0000 0.0000 3.000 1.000
## 1st Qu. 15.42 4.000 120.8 96.5 3.080 2.581 16.89 0.0000 0.0000 3.000 2.000
## Median 19.20 6.000 196.3 123.0 3.695 3.325 17.71 0.0000 0.0000 4.000 2.000
## Mean 20.09 6.188 230.7 146.7 3.597 3.217 17.85 0.4375 0.4062 3.688 2.812
## 3rd Qu. 22.80 8.000 326.0 180.0 3.920 3.610 18.90 1.0000 1.0000 4.000 4.000
## Max. 33.90 8.000 472.0 335.0 4.930 5.424 22.90 1.0000 1.0000 5.000 8.000
mydatadata frame?DF <- data.frame(a=rnorm(10),b=runif(10),d=sample(letters[1:3],10,replace=TRUE)); summary(DF)looks like a nice table format to me. You can make tweaks if you're so inclined.> data(iris) ; is.data.frame(summary_table <- as.data.frame(summary(iris)))returnsTRUE.