0

I wanted to run a loop that read the value in a dataframe (data_rais), but I realised it might take days and I think is due to fact that I'm running a loop, and not a function. I tried several times to write a function that does the same as this loop, but I couldn't find a way to do so. My question is: Is it possible to transform this loop in a function? How?

   for(i in 1:nrow(data_rais)){
  if(is.na(data_rais$postal_code[i])){
    next()
  } else {
    data_rais$munic_name[i] = munics_code[row(munics_code)[which(munics_code$cods == data_rais$munic[i])], 1]
  }
}

munics_code looks like this:

  munics_code = tibble::tribble(
    ~municipio,~cods,
    'BELFORD ROXO', 261,
    'DUQUE DE CAXIAS', 250,
    'DUQUE DE CAXIAS', 251,
    'DUQUE DE CAXIAS', 252,
    'DUQUE DE CAXIAS', 253,
    'DUQUE DE CAXIAS', 254,
    'ITABORAÍ', 248,
    'ITAGUAÍ', 2380,
    'ITAGUAÍ', 2381,
    'ITAGUAÍ', 2382,
    'ITAGUAÍ', 2383,
    'ITAGUAÍ', 2384,
    'MAGÉ', 259,
    'MANGARATIBA',2386,
    'MANGARATIBA',2387,
    'MANGARATIBA',2388,
    'MARICÁ',249,
    'MESQUITA',2655)

And data_rais$postal_code is a column of a data_frame with numbers that may or may not start with the numbers in the cods column in munics_code. Something like...

data_rais = data.frame(postal_code = c(2049253, 2033069, 2293513, 2411920, 2284937, 2341811, 2008638, 
                                       2279827, NA, 2386135, 2441900, 2392889, 2332114, 2254610, 
                                       2114414, 2089509, 2351781, 2451466, 2111632, 2070417, 2079485, 
                                       2328146, 2200329, 2116103, NA, 2449114, 2231708, NA, 
                                       NA, 2194253),
                       munic_name = NA)

Note: I cannot delete the NAs, I don't want to lose them.

5
  • 2
    match might be your friend :-) Commented Dec 24, 2019 at 14:27
  • @Base_R_Best_R I wish I could, but it isn't possible since I'm dealing with sensitive data. Commented Dec 24, 2019 at 14:28
  • 1
    If the data is sensitive, create a mock up data set with the same columns of interest. Commented Dec 24, 2019 at 14:33
  • Ok, i'm going to do that. Commented Dec 24, 2019 at 14:43
  • "may or may not start with the numbers in the cods column in munics_code" does this mean you need partial matching of the numbers? Commented Dec 24, 2019 at 14:56

2 Answers 2

3

I would suggest you use match

data_rais$munic_name = munic_code[[1]][match(data_rais$munic,munic_code$cods)]

to take care of entries when you already have a match in data_rais use the following:

data_rais$munic_name[!is.na(data_rais$postal_code)] = munic_code[[1]][match(data_rais$munic[!is.na(data_rais$postal_code)],munic_code$cods)]

Not sure if you need the second approach, but be careful with overriding original variables. If you're unsure add another variable and inspect the matching manually for a few entries.

Sign up to request clarification or add additional context in comments.

2 Comments

that's true :-) I rarely use this notation, and would normally suggest to use $ notation, also to be more consistent. I just got confused since the OP has different spellings of data frame and variable names... I updated my answer accordingly
Sorry, I just realised that my original answer might override already placed entries, where you don't have a match in the data frame to match to but already have an entry in your original data_rais
1

If I interpreted your code correctly, you are trying to set the data_rais$munic_name column to the corresponding municipio. This could be done with a merge:

df = merge(x = data_rais, y = munics_code, by.x = "postal_code", by.y = "cods", all.x = TRUE)

By doing a left merge (all.x = T) you'll preserve the NAs in data_rais. Assign the merge to data_rais if you want to add this column to it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.