0

i'm trying to figure out how to correct a few entry errors in a dataset i'm working with. i already fixed the problem, but i think the way i did it was inefficient because i replaced the values individually using a condition instead of iterating through the column and replacing the values using a condition.

in my dataset there were three observations for the corruption_score column that were off by a factor of 10. i wanted to loop through this column and replace any observation for that variable that is greater than 10 with itself divided by 10. example printout of my dataset is below.

# A tibble: 6 x 9
  country  year value deaths_per_100k region corruption_score  rank electricity_acc…
  <chr>   <dbl> <dbl>           <dbl> <chr>             <dbl> <dbl>            <dbl>
1 Iceland  2005 0.159            13.1 WE/EU               97     1              100
2 Finland  2005 0.232            13.7 WE/EU               96     2              100
3 New Ze…  2005 0.228            13.8 AP                  96     2              100
4 Finland  2006 0.271            13.1 WE/EU               9.6     1              100
5 Iceland  2006 0.156            12.8 WE/EU               9.6     1              100
6 New Ze…  2006 0.217            13.5 AP                  9.6     1              100

to solve this i tried to use a few different versions of this loop, including one in which the replacement operation is obs <- obs / 10, but i couldn't get anything to save outside of the loop. any advice? thanks in advance.

for (obs in wdi_gdp_long$corruption_score){
  
  if(obs > 10 & !is.na(obs)){
       
    wdi_gdp_long$corruption_score[obs] <- obs / 10
    
  }
  
}
4
  • 1
    Do idx <- x > 10 & !is.na(x); x[idx] <- x[idx] / 10, where x is wdi_gdp_long$corruption_score Commented Jun 28, 2020 at 17:34
  • 1
    Alternative: x / ifelse(!is.na(x) & x > 10, 10, 1) Commented Jun 28, 2020 at 17:36
  • 1
    Or x * c(1, 1/10)[idx + 1] Commented Jun 28, 2020 at 17:41
  • 1
    (@markus I've always found the boolean indexing coercion to feel weird ... I use it, but I don't always feel happy with them ...) Commented Jun 28, 2020 at 18:32

1 Answer 1

1

The corruption_score can be mutated using the tidyverse library like in the code snippet below:

library(tidyverse)
library(magrittr)

wdi_gdp_long %<>%
    mutate (corruption_score = if_else(corruption_score > 10, corruption_score/10, corruption_score))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.