0

the data is something like this:

> head(r)
  area    peri     shape perm
1 4990 2791.90 0.0903296  6.3
2 7002 3892.60 0.1486220  6.3
3 7558 3930.66 0.1833120  6.3
4 7352 3869.32 0.1170630  6.3
5 7943 3948.54 0.1224170 17.1
6 7979 4010.15 0.1670450 17.1

I want to perform multiple functions on each column, what I currently have is this function:

analysis = function(df){
  measurements = data.frame(attributes = character(),
                            mean = double(),
                            median = double(),
                            variance = double(),
                            IQR = double())
  for (i in 1:ncol(df)){
    names = colnames(df)[i]
    temp = data.frame(attribute = names,
                                   mean = mean(df[,i]),
                                   median = median(df[,i]),
                                   variance = var(df[,i]),
                                   IQR = IQR(df[,i]))
    measurements = rbind(measurements, temp)
  }
  return (measurements)
}

It works well and achieve what I want which gives the following output:

  attribute         mean      median     variance          IQR
1      area 7187.7291667 7487.000000 7.203045e+06 3564.2500000
2      peri 2682.2119375 2536.195000 2.049654e+06 2574.6150000
3     shape    0.2181104    0.198862 6.971657e-03    0.1004083
4      perm  415.4500000  130.500000 1.916848e+05  701.0500000

However, my supervisor said it is not efficient and not thinking in a R way. I also tried summarise_each()and summarise_all(r, funs(mean, median, var, IQR)) but it doesn't achieve what I want and the output doesn't look nice.

What are some other ways to achieve that output only using base R or dplyr.

2 Answers 2

1

I suspect your supervisors comment about 'R'-style thinking was about using that for loop. Almost any for loop you write can be replaced by the apply family of functions (e.g. apply, sapply, lapply etc).

They make it easier to run functions on vectors/data.frames/lists/etc.

Everything you could do using apply functions could be replicated in for loops (usually with similar performance) so using for loops isn't actually a cardinal sin. Why use apply functions? Well ... once you learn them you get more succinct code which returns the results of running your functions on your data. Before long, you'll find this sort of code very intuitive, and even more readable than for loops.

Base R

df <- data.frame(
  area = c(4990, 7002, 7558, 7352, 7943),
  peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54),
  shape = c(.0903296, .148622, .183312, .117063, .122417),
  perm = c(6.3, 6.3, 6.3, 6.3, 17.1)
)

sapply(df, function(x) c(mean=mean(x), median=median(x), var=var(x), IQR=IQR(x)))
Sign up to request clarification or add additional context in comments.

2 Comments

credit to @WilliamGram for writing out the data. Technically, var and IQR are functions that live in the stats library, but you don't need to manually load any library to get them. If you want an answer that more strictly answers the question, see Williams response
To be fair I actually didn't know the stats library was loaded by default until your comment. That's why I was very strictly going base R. So thank you for teaching me something new.
0

Your results can be achieved using base::Map:

f <- function(x) {
  desc = base::summary(x)
  c(
    Mean = unname(desc['Mean']),
    Median = unname(desc['Median']),
    Variance = base::sum((x-desc['Mean'])**2)/(length(x)-1),
    IQR = unname(desc['3rd Qu.'] - desc['1st Qu.'])
  )
}

t(as.data.frame(base::Map(f, df)))
#               Mean       Median     Variance          IQR
# area  7137.3333333 7455.0000000 1.241980e+06 757.25000000
# peri  3740.5283333 3911.6300000 2.183447e+05  68.93000000
# shape    0.1381314    0.1355195 1.192633e-03   0.04403775
# perm     9.9000000    6.3000000 3.110400e+01   8.10000000

Apologies

Data:

df <- data.frame(
  area = c(4990, 7002, 7558, 7352, 7943, 7979),
  peri = c(2791.9, 3892.6, 3930.66, 3869.32, 3948.54, 4010.15),
  shape = c(.0903296, .148622, .183312, .117063, .122417, .167045),
  perm = c(6.3, 6.3, 6.3, 6.3, 17.1, 17.1)
)

Hope that's useful.

1 Comment

doesnt quite look like what he's asking for

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.