i'm trying to figure out how to correct a few entry errors in a dataset i'm working with. i already fixed the problem, but i think the way i did it was inefficient because i replaced the values individually using a condition instead of iterating through the column and replacing the values using a condition.
in my dataset there were three observations for the corruption_score column that were off by a factor of 10. i wanted to loop through this column and replace any observation for that variable that is greater than 10 with itself divided by 10. example printout of my dataset is below.
# A tibble: 6 x 9
country year value deaths_per_100k region corruption_score rank electricity_acc…
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Iceland 2005 0.159 13.1 WE/EU 97 1 100
2 Finland 2005 0.232 13.7 WE/EU 96 2 100
3 New Ze… 2005 0.228 13.8 AP 96 2 100
4 Finland 2006 0.271 13.1 WE/EU 9.6 1 100
5 Iceland 2006 0.156 12.8 WE/EU 9.6 1 100
6 New Ze… 2006 0.217 13.5 AP 9.6 1 100
to solve this i tried to use a few different versions of this loop, including one in which the replacement operation is obs <- obs / 10, but i couldn't get anything to save outside of the loop. any advice? thanks in advance.
for (obs in wdi_gdp_long$corruption_score){
if(obs > 10 & !is.na(obs)){
wdi_gdp_long$corruption_score[obs] <- obs / 10
}
}
idx <- x > 10 & !is.na(x); x[idx] <- x[idx] / 10, wherexiswdi_gdp_long$corruption_scorex / ifelse(!is.na(x) & x > 10, 10, 1)x * c(1, 1/10)[idx + 1]