2

I want to restructure my land use classification dataframe and add new rows and columns based on conditions of the dataframe. I've been using dplyr to try this, however the examples I have found tend to decrease the columns or rows and not increase the number of rows based off conditions. I have tried to loop through the dataset to add the rows, but want to know if there is a better way to do it in dplry? I'm open to using a different library as well but its a very large dataset of classifications and dplyr seems to work well with the dataframe?

Here is the code example of what my current dataframe looks like (df_old) and what I would like it to be in the end (df_new).

What I want to do, is everytime the Year1990-2015 changes it creates a new row. Example: ID 424, in 1990 it is 51, but in the Year 2000 it changes to 21 and remains 21 until present day. This means that the new dataframe for ID 424 should ahve two rows. One labeled Start_Year that represents the begining of the land use in 1990 as Forest (Landuse = 51) and remains Forest until it changes in 2000. Since in 2000 it is Pavement, we assume it was Forest still in 1999 and End_Year will be 1999 for the first row of ID 424. Than a new row appears for ID 424 where Start_Year is 2000 as it changed to Pavement (Landuse = 21) and remains 21 until End_year (Present day).

To add context, the dataset represents how an area changes in a city, where the numbers in Year1990-2015 are used to identify different landuse classifications( 21 = Pavement, 24 = Park, 25 = Residential, 51 = Forest, 41 = Agriculture).

df_old <- data.frame(ID = c(424,426,427,428),
             Parameter= c(0.01,0.03,0.03,0.01),
             City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
             Area = c(3.12,7.98,2.01,0.48),
             Year1990 = c(51,51,51,41),
             Year2000 = c(21,51,51,41),
             Year2005 = c(21,51,51,25),
             Year2010 = c(21,51,51,24),
             Year2015 = c(21,51,51,25))

df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
             Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
             City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
             Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
             Start_Year = c(1990,2000,1990,1990,1990,2005,2010,2015),
             End_Year = c(1999,"present","present","present",2004,2009,2014,"present"),
             Landuse = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))

OG Data

This is what I would like the final product to be:

New dataframe structure

2
  • 1
    I am confused about the "end" column - where in the original dates (df__old) do the values 2014 or current appear? Also, how do you form the LU-LU column? I would suggest editing your post to be more explicit about what needs to be done. Commented Dec 2, 2021 at 21:06
  • Thank you, I've added some clairification Commented Dec 2, 2021 at 23:02

1 Answer 1

3

This solution works for your example data, but it's difficult to determine the 'rules' governing your required manipulations (and therefore difficult to know whether it will work with your real data). If this fails on your real data, please edit your post with more info.

library(tidyverse)

df_old <- data.frame(ID = c(424,426,427,428),
                     Parameter= c(0.01,0.03,0.03,0.01),
                     City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
                     Area = c(3.12,7.98,2.01,0.48),
                     Year1990 = c(51,51,51,41),
                     Year2000 = c(21,51,51,41),
                     Year2005 = c(21,51,51,25),
                     Year2010 = c(21,51,51,24),
                     Year2015 = c(21,51,51,25))

df_new <- data.frame(ID = c(424,424,426,427,428,428,428,428),
                     Parameter= c(0.01,0.01,0.03,0.03,0.01,0.01,0.01,0.01),
                     City = c("Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford","Abbotsford"),
                     Area = c(3.12,3.12,7.98,2.01,0.48,0.48,0.48,0.48),
                     Start = c(1990,2000,1990,1990,1990,2005,2010,2015),
                     End = c(1999,"present","present","present",2004,2009,2014,"present"),
                     LU = c("51-51","51-21","51-51","51-51","41-41","41-25","25-24","24-25"))


df_old %>%
  pivot_longer(cols = -c(1:4)) %>%
  group_by(ID) %>%
  mutate(Start = as.numeric(str_extract(name, "\\d+"))) %>%
  mutate(`LU-LU` = paste(lag(value, default = max(value)), "-", value, sep = "")) %>%
  distinct(`LU-LU`, .keep_all = TRUE) %>%
  group_by(ID) %>%
  filter(value != lag(value, default = 0)) %>%
  group_by(ID) %>%
  mutate(End = lead(Start, default = NA) - 1,
         End = replace_na(End, "present")) %>%
  select(c(ID, Parameter, City, Area, Start, End, `LU-LU`))
#> # A tibble: 8 × 7
#> # Groups:   ID [4]
#>      ID Parameter City        Area Start End     `LU-LU`
#>   <dbl>     <dbl> <chr>      <dbl> <dbl> <chr>   <chr>  
#> 1   424      0.01 Abbotsford  3.12  1990 1999    51-51  
#> 2   424      0.01 Abbotsford  3.12  2000 present 51-21  
#> 3   426      0.03 Abbotsford  7.98  1990 present 51-51  
#> 4   427      0.03 Abbotsford  2.01  1990 present 51-51  
#> 5   428      0.01 Abbotsford  0.48  1990 2004    41-41  
#> 6   428      0.01 Abbotsford  0.48  2005 2009    41-25  
#> 7   428      0.01 Abbotsford  0.48  2010 2014    25-24  
#> 8   428      0.01 Abbotsford  0.48  2015 present 24-25

Created on 2021-12-03 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

2 Comments

Wow, this looks fantastic! I see it uses the tidyr and stringr. I've tested it with the smaller dataset, and it seems to have done the trick! I will run it overnight on the large dataset overnight to see if there any funny errors. But it looks awesome! Thank you!
You're welcome; glad it solves your issue. Thanks for editing your question to include additional details

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.