-1

I'm quite new to programmin language and I am starting with R in my research predicting dengue desease cases with climatic data. I'm still cleaning my data to work with and this particular one has around 172.855 obs and 17 variables, for each of the 23 files. So I want to keep only the obs. and variables I need to use (which are the date, municipality and quantity of cases registered), but I wanted to create a way to do it automatically so I don't need to keep doing it to all of them, but I didn't quite understand how to do it using a loop from purrr our lapply. Could anyone help me with this?

Example of the dataset with cases of infections enter image description here

What I wrote so far are this 3 lines and they are basically what I need to keep only what I want. (The files names are: dengue00, dengue01, dengue02,...,dengue23)

#Load package
library(tidyr)

#Reading the file

dengue00 <-read.csv("DENGBR00.csv")

#This line is for keeping only the three columns I need out of the 17 the documents have.
dengue00 <- subset(dengue00, select = c(1, 3, 8))

#Here is to keep only the municipality I'll use in the obs.
dengue00 <- dengue00[deng00$ID_MUNICIP %in% c(3550308),]

#And this one just to simplify the column names

dengue00<- dengue00 %>%
  rename(mn = ID_MUNICIP,
         dt = DT_NOTIFIC,
         uf = SG_UF_NOT) 

[The idea is to end up like like this]

enter image description here

Thank you so much for any help.

4

1 Answer 1

0

Assuming the files are in the working directory, something like the following should read them in.

# auxiliary function with default columns and ID_MUNICIP to keep
fun <- function(filename, cols = c(1L, 3L, 8L), id_municip = 3550308) {
  filename |>
    read.csv() |>
    subset(ID_MUNICIP %in% id_municip, select = cols) |>
    setNames(c("mn", "dt", "uf"))
}

# filenames <- "DENGBR00.csv"

# apply the function above to each filename and set
# the returned list's names
dengue <- lapply(filenames, fun) |> 
  setNames(sprintf("dengue%02d", seq_along(filenames)))

# any of these two instructions refers to the same data.frame
dengue$dengue00
dengue[["dengue00"]]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much, Rui Barradas! This seemed to work pretty fine! Think I understood things better now! :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.