Loop through data frame and match/populate rows with column values

Question

df1

...other columns...MON TUE WED THU FRI SAT SUN Total
                   8.5 8.5 8.5 8.5 8.5 6.0 0.0 0.0
                   9.0 9.0 9.0 9.0 9.0 6.0 6.0 0.0
                   6.0 7.0 7.0 7.0 7.0 5.0 0.0 0.0
                   ...about 1400 records/rows of data in df1

df2

Day Hours
FRI   0
SAT   0
SUN   0
MON   0
TUE   0
WED   0
THU   0
FRI   0
SAT   0
SUN   0
MON   0
TUE   0
WED   0
...Will keep going until end (28-31 days worth will be listed - all days in a month)

Step 1

I need to loop through all lines in df2, matching all records in column 'Day' with df1 columns MON-SUN...adding the corresponding hours in df1 to the 'Hours' column in df2...using only the first row in df1 until all of df2 has been looped through and filled...Output example below

df2 output

Day   Hours
FRI   8.5
SAT   6.0
SUN   0
MON   8.5
TUE   8.5
WED   8.5
THU   8.5
FRI   8.5
SAT   6.0
SUN   0
MON   8.5
TUE   8.5
WED   8.5
...will continue all the way until last row until all data is filled from 1st row in df1 (repeating itself, just matching the right values)

2nd step

After df2 has been looped and filled - then sum of 'Hours' column in df2 and place in df1 'total' column

df1 output

...other columns...MON TUE WED THU FRI SAT SUN Total
                   8.5 8.5 8.5 8.5 8.5 6.0 0.0 88.5
                   9.0 9.0 9.0 9.0 9.0 6.0 6.0
                   6.0 7.0 7.0 7.0 7.0 5.0 0.0

This then repeats until all rows in df1 has been looped and has gone through the same process...so probably a double loop of some sort is needed with a match function. I am struggling to find any solutions to this

Code used

row_df1 <- 1
row_df2 <- 1

for (row_df2 in seq(1,nrow(Calendar$Jan))) {
 for (day in week) {
  if (Calendar$Jan[row_df2, 'Day'] == day) {
   Calendar$Jan[row_df2,'Hours'] <- Calctable[row_df1,day]
   row_df2 <- row_df2 + 1
    }
  }
 }

Error Message

 Error in for (day in week) { : invalid for() loop sequence

Many thanks

randomchars42 · Accepted Answer · 2021-01-17 04:14:35Z

2

library(dplyr)

df1 <- data.frame(
  MON = c(1,2,3),
  TUE = c(5,6,7),
  WED = c(8,9,10),
  THU = c(11,12,13),
  FRI = c(14,15,16),
  SAT = c(17,18,19),
  SUN = c(20,21,22))

df2 <- data.frame(
  Day = c('THU', 'FRI', 'SAT', 'SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN', 'MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'),
  Hours = 0
)

Example df1: (sorry, I didn't take the time to recreate you exact data, please follow through)

    MON   TUE   WED   THU   FRI   SAT   SUN
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5     8    11    14    17    20
2     2     6     9    12    15    18    21
3     3     7    10    13    16    19    22

Example df2:

   Day   Hours
   <chr> <dbl>
 1 THU       0
 2 FRI       0
 3 SAT       0
 4 SUN       0
 5 MON       0
 6 TUE       0
 7 WED       0
 8 THU       0
 9 FRI       0
10 SAT       0
11 SUN       0
12 MON       0
13 TUE       0
14 WED       0
15 THU       0
16 FRI       0
17 SAT       0
18 SUN       0

Step 1: This should be the algorithm you`re looking for to sort df2 into df1 in the way you described it.

row_df2 <- 1

for (row_df1 in seq(1,nrow(df1))) {
  for (day in c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN'))
    if (df2[row_df2, 'Day'] == day) {
      df2[row_df2,'Hours'] <- df1[row_df1,day]
      row_df2 <- row_df2 + 1
    }
}

Step 2: now you could sum up the values in df1, e.g. using dplyr:

df1 <- df1 %>%
  mutate(
    Sum = MON + TUE + WED + THU + FRI + SAT + SUN
  )

df1:

# A tibble: 3 x 8
    MON   TUE   WED   THU   FRI   SAT   SUN   Sum
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     5     8    11    14    17    20    76
2     2     6     9    12    15    18    21    83
3     3     7    10    13    16    19    22    90

df2:

# A tibble: 18 x 2
   Day   Hours
   <chr> <dbl>
 1 THU      11 <- row 1: THU
 2 FRI      14 <- row 1: FRI
 3 SAT      17 <- ...
 4 SUN      20
 5 MON       2 <- row 2: MON
 6 TUE       6 <- ....
 7 WED       9
 8 THU      12
 9 FRI      15
10 SAT      18
11 SUN      21 <- row 2: SUN
12 MON       3 <- row 3: MON
13 TUE       7
14 WED      10
15 THU      13
16 FRI      16
17 SAT      19
18 SUN      22

Is there no identifier like Date in both tables? This would make it much more robust. You could then match by date without relying on the right day to start with.

Edit 1: Updated after testing and removal of some errors.

Edit 2: Highlighted which value from df1 will land in df2. I just used different example data than you (I didn't want to type it all in).

Edit 3: Used data.frame instead of tibble in example data to demonstrate it should work as well.

Edit 4: Is this what you want?

row_df1 <- 1
row_df2 <- 1

for (row_df2 in seq(1,nrow(df2))) {
  for (day in week) {
    if (df2[row_df2, 'Day'] == day) {
      df2[row_df2,'Hours'] <- df1[row_df1,day]
      row_df2 <- row_df2 + 1
    }
}

df2

will lead to:

   Day Hours
1  THU    11 <- row 1: THU
2  FRI    14
3  SAT    17
4  SUN    20
5  MON     1
6  TUE     5
7  WED     8
8  THU    11 <- row 1: THU
9  FRI    14
10 SAT    17
11 SUN    20
12 MON     1
13 TUE     5
14 WED     8
15 THU    11 <- row 1: THU
16 FRI    14
17 SAT    17
18 SUN    20

Edit 5: Seems there is a { missing:

for (row_df2 in seq(1,nrow(Calendar$Jan))) {
 for (day in week) {                                       # <- HERE
  if (Calendar$Jan[row_df2, 'Day'] == day) {
   Calendar$Jan[row_df2,'Hours'] <- Calctable[row_df1,day]
   row_df2 <- row_df2 + 1
    }
  }

Edit 6:

In Edit 5 I assigned week <- c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN') but forgot to mention it. It should have looked like (no special built-in variables here):

week <- c('MON', 'TUE', 'WED', 'THU', 'FRI', 'SAT', 'SUN')

for (row_df2 in seq(1,nrow(Calendar$Jan))) {
 for (day in week) {
  if (Calendar$Jan[row_df2, 'Day'] == day) {
   Calendar$Jan[row_df2,'Hours'] <- Calctable[row_df1,day]
   row_df2 <- row_df2 + 1
  }
 }
}

in case you re-use week at some other point in your code. I used it for testing the loop and mixed it up in the previous version of this answer.

edited Jan 17, 2021 at 4:14

answered Jan 16, 2021 at 16:24

randomchars42

3351 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

randomchars42 Over a year ago

Sorry, I didn't want to confuse you @Daisy . A tibble is basically a dataframe . Citing the package: "tibble() is a nice way to create data frames." more. The code from step 2 makes use of the package dplyr which is part of the tidyverse, a set of packages aiming to make data manipulation and visualisation easier and more consistent. I used tibbles only as example, you can easily keep your data.frame.

Daisy Over a year ago

Hi, Thank for for that explanation, very much appreciated! I have used your code above but getting the following error (please refer to question)...Also i noticed your outputs for df1 and df2 are very different to the desired output...so df2 should hold only values from df1 in the 'Hours' column...using the first row in df1 first then totaling the 'Hours' column and placing that total in df1 next to the first row and then this continues until all rows in df1 have gone through the same step...end will be when all have a total number...I hope this makes sense

Daisy Over a year ago

Just wanted to add the total in df1 is the latter process...meaning once df2 is filled then the sum of the hours column in df2 will be placed in the 'total' column in df1 (row by row)

Daisy Over a year ago

Thank you again @randomchars42 - really appreciate your help here

randomchars42 Over a year ago

@Daisy: as to the error message: I can't figure it out at the moment, how do you construct your data, is it a data.frame or something else?

|

Collectives™ on Stack Overflow

Loop through data frame and match/populate rows with column values

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related