2

Consider this toy data frame:

df <- data.frame(id = c(1, 2),
             meandoy = c(3,2),
             temp199701 = c(4,2),
             temp199702 = c(15,10),
             temp199703 = c(-3,7),
             temp199704 = c(-1,6),
             temp199801 = c(1,5),
             temp199802 = c(9,10),
             temp199803 = c(-2,2),
             temp199804 = c(-5,11))

I want to add a new column with the result of a function for each year and each row. In other words, each new GDDyearcolumn gets the value of calculating from tempyear01to tempyear04.

I can achieve it with this:

sum.GDD <- function(x) sum(x[x > 5]-5, na.rm = TRUE)
    
yearlist <- c(1997, 1998)
        
for (year in yearlist){
      text <- paste("GDD",toString(year), sep = "")
      df[[text]] <- df %>%  #store result in this vector
        dplyr::select(contains(toString(year))) %>% #take variables that have year
        apply(1, sum.GDD) #calculate GDD5 across those variables
    }

But there is a twist. I want to apply the function only to the number of columns specified in meandoy each year.

For example, GDD1997 in the first row will be the result of calculating the first 3 columns starting from temp199701, because meandoy = 3. GDD1998 will get the result from temp199801, temp199802 and temp199803.

In the second row the meandoy = 2 so the result of GDD1997 will be calculated from temp199701 and temp199702. GDD1998 from temp199801 and temp199802.

1 Answer 1

2

If in doubt a problem is usually made simpler by turning the data into a long format.

Since you're already using dplyr we can:

totals <- df %>%
  # Turn the dataframe into format id, meandoy, year, doy, value by parsing
  # the columns while unpivoting.
  pivot_longer(
    c(everything(), -id, -meandoy), 
    names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})", 
    names_transform = list(year = as.integer, doy = as.integer)
  ) %>%
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 
  # Selects all columns (in the original df) from year01 to year<meandoy>.
  filter(doy <= meandoy) %>%
  # Calculate the GDD  
  group_by(id, year) %>%
  summarize(total = sum.GDD(value), .groups = "drop") %>%
  # Back to the original format.
  pivot_wider(names_from = year, values_from = total, names_prefix = "GDD") 

left_join(df, totals, by = "id")

This should be faster than an approach doing row-wise operations and/or loops.

Sign up to request clarification or add additional context in comments.

5 Comments

Just a comment, the mutate(...) step can be done by the pivot_longer() in the previous step if you change it to pivot_longer(c(everything(), -id, -meandoy), names_to = c("year", "doy"), names_pattern = "temp(\\d{4})(\\d{2})", names_transform = list(year = as.integer, doy = as.integer)). It's a little different because it doesn't retain the name column, but the final product is the same.
Ah, yes, I was only looking at sep. This also makes the extraction of the year safer, thanks!
Yes thanks, the only thing is that doy is now the last part of a temp199701. In the real data, it should be a consecutive number from 1 to meandoy. That's because the real dataset is much longer with many days and months. Therefore, meandoy can contain a value of, say, 110.
As long as it clearly follows the same structure you can just adapt the regex, e.g. "temp(\\d{4})(\\d+)"
That didn't work but I fixed it by adding a new line between pivot_longer and the filter: group_by(id, meandoy, year) %>% mutate(pos = 1:n() ) %>% . This way, I get the consecutive number I wanted from 1 to 365 (end of each year in the original data). Anyway, thanks for your approach. I got stuck trying to edit the original loop and the apply function which probably was not a good idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.