0

I have a dataframe, similar to the example below, but larger (15000 rows):

df.example <-structure(list(Date = structure(c(3287, 3386, 4286, 5286, 6286), class = "Date"),v1 = c(1L, 1L, 1L, 1L, 1L), v2 = c(0.60378, 12.82581, 3.55357, 4.96079, 0.0422),perc = c(0.598, 0.598, 0.609, 1, 0.609), v3 = c(-99, -99, 5.83509031198686, 4.96079,0.0692939244663383)), .Names = c("Date", "v1", "v2", "perc", "v3"), row.names = c(1L, 100L, 1000L, 2000L, 3000L), class = "data.frame")

df.example:

       Date     v1       v2  perc           v3
1    1979-01-01  1  0.60378 0.598 -99.00000000
100  1979-04-10  1 12.82581 0.598 -99.00000000
1000 1981-09-26  1  3.55357 0.609   5.83509031
2000 1984-06-22  1  4.96079 1.000   4.96079000
3000 1987-03-19  1  0.04220 0.609   0.06929392

What I would like to do is calculate the percentage of rows that are below a "certain threshold value" for column "perc". I would like to do this multiple times for multiple "certain threshold values", given below:

### "certain threshold values":
seq(from =0, to = 1, by = 0.1)


### formula to be repeated/iterated/looped: (the i stands for "certain value")
100*sum(df.example$perc<=i)/nrow(df.example)

I would like the outcome to be a vector called "vector1", like the example below:

vector1 <- c(0,0,0,0,0,0,0.2,0.6,0.6,0.6,1.0)    

This is what I have so far, but it is not working:

### create vector to store calculated values in
vector1=c()
vector1[1]=3

### loop calculation of percentage of rows that are below "certain threshold value" in column df.example$perc
for(i in seq(0,1, by=0.1)){
vector1[i]=sum(df.example$perc<=i)/nrow(df.example)
}

I only get one value, which I would expect to be the last one of my vector1.

I already looked at similar topics in SO, as R create a vector with loop structure & How to make a vector using a for loop

Any suggestions?

By the way: please comment if the dput() I used doesn't create the data to work with, its the first time I use dput().

2
  • You may need s1 <- seq(0, 1, 0.5); for(i in seq_along(s1)){vector1[i]=sum(df.example$perc<=s1[i])/nrow(df.example) } also, initialize vector1 <- numeric(nrow(df.example)) Commented Nov 7, 2016 at 14:58
  • difference between : for(i in seq_along(seq(0,1, by=0.1))){print(i)} and for(i in seq(0,1, by=0.1)){print(i)} shall explain you the solution Commented Nov 7, 2016 at 14:59

3 Answers 3

1

Concerning the number of rows, no need to compute it each time, you can assign it to a variable. Then you can use sapply:

nrow_df <- nrow(df.example)
sapply(seq(from =0, to = 1, by = 0.1), function(x) sum(df.example$perc<=x)/nrow_df)
# [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

Or (vectorized)

indx <- seq(0, 1, by=0.1)
rowSums(df.example$perc <= matrix(indx, length(indx), nrow(df.example))) / nrow(df.example)
## [1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0
Sign up to request clarification or add additional context in comments.

Comments

0

Here is a fourth method using outer and colSums:

colSums(outer(df.example$perc, seq(from=0, to=1, by=0.1), "<=")) / nrow(df.example)
[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

outer creates a logical matrix that shows performs the treshold test for each threshold-element pair. The "successes" are summed along the column with colSums, and this count is divided by the number of elements tested.

Comments

0

We need to initialize the vector1 and loop through the sequence in the for loop.

s1 <- seq(0, 1, 0.1)
vector1 <- numeric(nrow(df.example))
for(i in seq_along(s1)){
   vector1[i]=sum(df.example$perc<=s1[i])/nrow(df.example)
 }
vector1
#[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

Or a vectorized approach would be

rowSums(outer(s1, df.example$perc, FUN = `>=`))/nrow(df.example)
#[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.8 0.8 1.0

1 Comment

Your second vectorized approach also worked on the larger dataset. The first approach did not. Thanks for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.