0

My data looks like this:

ROW ID DV IDV
1   1   0  0.25
2   1  34  0.5  
3   1  33  1
4   1  20  2
5   1  19  3
6   1  18  4
7   1  15  5
8   1  10  6
9   2   0  0.25
10  2  40  0.5  
11  2  39  1
12  2  35  2
13  2  28  3
14  2  20  4
15  2  13  5
16  2   9  6
17  3   0  0.25
18  3  30  0.5  
19  3  20  1
20  3  19  2
21  3  18  3
22  3  17  4
23  3  12  5
24  3   7  6

I want it to look like this :

ROW ID DV IDV    NEWDV
1   1   0  0.25     0
2   1  34  0.5     34
3   1  33  1       33  
4   1  20  2       20
5   1  19  3        9.5
6   1  18  4        4.5
7   1  15  5        1.875
8   1  10  6        0.375
9   2   0  0.25     0
10  2  40  0.5     40
11  2  39  1       39
12  2  35  2       35
13  2  28  3       28
14  2  20  4       20
15  2  13  5        6.5
16  2   9  6        2.25
17  3   0  0.25     0
18  3  30  0.5     30
19  3  20  1       20
20  3  19  2       19
21  3  18  3        9
22  3  17  4        4.25
23  3  12  5        1.5
24  3   7  6        0.4375

I have many datasets like this and I am trying to accomplish the same for each dataset. So, what I want to do is to create a column NEWDV by dividing DV values by 2,4,8,16,24,128 and so on(that is 2 raised to 1,2,3,4,5,6,7 and so on). I want to do this only when IDV > 2 and DV<20. For example take rows 21 to 24, for these four rows the condition of DV< 20 and IDV > 2 has been met and the NEWDV column reads 18/2=9 ,17/4=4.25,12/8=1.5,7/16=0.4375. This computation has to be reset for each ID.

I tried using the following code with no success:

fc is the object having the data

x <- c(2,4,8,16)
for(i in 1:4){
    for(j in 1:4){
        for(g in 1:length(fc$DV<20 & fc$ID==i & fc$IDV>2)) {
            fc$NEWDV[g] <-ifelse(fc$DV[fc$ID==i][g]<20 & fc$IDV[fc$ID==i][g]>2,fc$DV[fc$ID==i][g]/x[j],fc$DV[fc$ID==j][g])
        }
    }
}

What am I doing wrong? Help is greatly appreciated!! I would like to only use a for loop for this problem. Any other solutions are also welcome. I am just familiar with for loops.Thank you.

3
  • @AlecBrooks : Thank you for editing. I have posted questions here before, but I haven't figured out the proper way to put the data. How does one do it? Commented Feb 5, 2015 at 22:41
  • @Henrik : IDV equals 2 for row 20 , the condition is that IDV should be greater than 2 for the computation to take effect that is why on row 21 NEWDV= 18/2 Commented Feb 5, 2015 at 22:59
  • Vineet, you format data the same way you format code samples. Commented Feb 5, 2015 at 23:44

2 Answers 2

3

This is a great time to use the cumsum function to count the number of rows up to and including the current row where your condition (IDV > 2 and DV < 20) is true; you can normalize DV by two raised to the power of this cumulative sum. Then you can apply this function to each part of your data frame broken up by ID.

# Split by ID
spl <- split(dat, dat$ID)

# Grab the normalized DV value for each grouping
new.dv <- lapply(spl, function(x) x$DV / 2^cumsum(x$IDV > 2 & x$DV < 20))

# Add the new values back to your data frame
dat$NEWDV <- unlist(new.dv)
dat
#    ROW ID DV  IDV   NEWDV
# 1    1  1  0 0.25  0.0000
# 2    2  1 34 0.50 34.0000
# 3    3  1 33 1.00 33.0000
# 4    4  1 20 2.00 20.0000
# 5    5  1 19 3.00  9.5000
# 6    6  1 18 4.00  4.5000
# 7    7  1 15 5.00  1.8750
# 8    8  1 10 6.00  0.6250
# 9    9  2  0 0.25  0.0000
# 10  10  2 40 0.50 40.0000
# 11  11  2 39 1.00 39.0000
# 12  12  2 35 2.00 35.0000
# 13  13  2 28 3.00 28.0000
# 14  14  2 20 4.00 20.0000
# 15  15  2 13 5.00  6.5000
# 16  16  2  9 6.00  2.2500
# 17  17  3  0 0.25  0.0000
# 18  18  3 30 0.50 30.0000
# 19  19  3 20 1.00 20.0000
# 20  20  3 19 2.00 19.0000
# 21  21  3 18 3.00  9.0000
# 22  22  3 17 4.00  4.2500
# 23  23  3 12 5.00  1.5000
# 24  24  3  7 6.00  0.4375

This approach of breaking up your data frame, applying some sort of function, and them combining the results is called split-apply-combine and is a common data wrangling methodology.

Sign up to request clarification or add additional context in comments.

1 Comment

:Thank you for the code. Do you know what I am doing wrong in the code I shared above?
1

Here, we use data.table. Convert the "data.frame" to "data.table" (setDT(df)). Create new columns ("NEWDV" by converting the "DV" class to "numeric"; a logical "indx" column). Assign (:=) "NEWDV" with changed values (NEWDV/2^...) when "indx" is TRUE ((indx)), after grouping by "ID". Remove the "indx" column by assigning it to "NULL"

library(data.table)
setDT(df)[,c('NEWDV', 'indx'):= list(as.numeric(DV),
    IDV>2 & DV <20)][(indx), NEWDV:=NEWDV/2^cumsum(indx), ID][,indx:=NULL][]
#     ROW ID DV  IDV   NEWDV
#  1:   1  1  0 0.25  0.0000
#  2:   2  1 34 0.50 34.0000
#  3:   3  1 33 1.00 33.0000
#  4:   4  1 20 2.00 20.0000
#  5:   5  1 19 3.00  9.5000
#  6:   6  1 18 4.00  4.5000
#  7:   7  1 15 5.00  1.8750
#  8:   8  1 10 6.00  0.6250
#  9:   9  2  0 0.25  0.0000
# 10:  10  2 40 0.50 40.0000
# 11:  11  2 39 1.00 39.0000
# 12:  12  2 35 2.00 35.0000
# 13:  13  2 28 3.00 28.0000
# 14:  14  2 20 4.00 20.0000
# 15:  15  2 13 5.00  6.5000
# 16:  16  2  9 6.00  2.2500
# 17:  17  3  0 0.25  0.0000
# 18:  18  3 30 0.50 30.0000
# 19:  19  3 20 1.00 20.0000
# 20:  20  3 19 2.00 19.0000
# 21:  21  3 18 3.00  9.0000
# 22:  22  3 17 4.00  4.2500
# 23:  23  3 12 5.00  1.5000
# 24:  24  3  7 6.00  0.4375

data

df <- structure(list(ROW = 1:24, ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L), DV = c(0L, 34L, 33L, 20L, 19L, 18L, 15L, 10L, 0L, 40L, 39L, 
35L, 28L, 20L, 13L, 9L, 0L, 30L, 20L, 19L, 18L, 17L, 12L, 7L), 
IDV = c(0.25, 0.5, 1, 2, 3, 4, 5, 6, 0.25, 0.5, 1, 2, 3, 
4, 5, 6, 0.25, 0.5, 1, 2, 3, 4, 5, 6)), .Names = c("ROW", 
"ID", "DV", "IDV"), class = "data.frame", row.names = c(NA, -24L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.