0

I am trying a nested ifelse statement within a for loop to create a new variable, the values of which are based on the frequency of occurrence of a factor variable (a list of postcodes).

The new variable should return a predefined series of numbers based on the frequency of a postcode (frequencies range between 1 and 4). Each of these number series must end in 800 and increase in increments of 200, the starting point of which depends on the frequency of each postcode: the higher the frequency, the lower the starting increment of 200.

For this I have defined a for loop, in which I first measure the frequency of each postcode, followed by a nested ifelse statement, specifying each series of numbers to be allocated to the NewVar based on the frequency.

A small intuitive example of what I want to achieve is written here, I want to apply this on a dataframe containing millions of postcodes.

DESIRED RESULT:

Postcode  NewVar
AA        600
AA        800
BB        400
BB        600
BB        800
CC        800
DD        200
DD        400
DD        600
DD        800

CODE:

DF$NewVar <- 0

DF$NewVar <- for (i in levels(DF$Postcode[i]))
ifelse((table(DF$Postcode[i]) == 4), DF$NewVar[i] <- c(200,400,600,800),
  (ifelse ((table(DF$Postcode[i]) == 3), DF$NewVar[i] <- c(400,600,800),
    (ifelse ((table(DF$Postcode[i]) == 2), DF$NewVar[i] <- c(600,800), 
      DF$NewVar[i] <- c(800))))))

PROBLEM 1:

Firstly, when running the entire code, I receive an error stating that there is a mismatch between the amount of rows in the replacement versus the data, whilst when manually checking for this, it is not the case (the mismatch is always limited to exactly 1 row).

Error in `$<-.data.frame`(`*tmp*`, NewVar, value = c("0", "0", "0",  : 
replacement has 11 rows, data has 10.

PROBLEM 2:

TESTING IF AN IFELSE WORKS ON ITS OWN (OUT OF THE LOOP):

When verifying if the ifelse clause works on its own (outside of the loop), I see that only the starting increment of 200 is copied on each line of NewVar, so it does not increment to 800. This is not what I want to achieve either:

CODE TESTING ONE IFELSE:

DF$NewVar[1:2] <- ifelse((sum(table(DF$Postcode)) == 2),                       
  DF$NewVar[1:2] <- c(600,800), "NA")

RESULT (not desired):

Postcode  NewVar
AA        200
AA        200

DESIRED RESULT:

Postcode  NewVar
AA        200
AA        400

Note: I predefined the NewVar column before trying to allocated the variable, and I have checked for NA´s already as well.

Thank you in advance for your time.

2 Answers 2

1

One way if you're willing to use dplyr:

library(dplyr)
DF <- structure(list(Postcode = c("AA", "AA", "BB", "BB", "BB", "CC", 
"DD", "DD", "DD", "DD")), class = "data.frame", row.names = c(NA, 
-10L))

vals <- c(200,400,600,800)
DF %>% group_by(Postcode) %>% mutate(NewVar = tail(vals,n()))
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you @joran for this solution: it proved to be much userfriendly than the triple nested ifelse I initially tried. Also, (correct me if I am wrong) but using dplyr should save me a lot of time when running this code on millions of data, versus the ifelse iterations.
0

For the sake of completeness, here is a base R solution which uses the ave() function.

Let's assume Postcode is a vector of postcodes in random order:

Postcode
 [1] "BB" "CC" "CC" "BB" "BB" "AA" "CC" "BB" "AA" "DD"

the code below creates a data.frame including Postcode and NewVar:

data.frame(
  Postcode, 
  NewVar = ave(Postcode, Postcode, 
               FUN = function(x) seq(to = 800, by = 200, length.out = length(x)))
)
   Postcode NewVar
1        BB    200
2        CC    400
3        CC    600
4        BB    400
5        BB    600
6        AA    600
7        CC    800
8        BB    800
9        AA    800
10       DD    800

Data

# create data
library(magrittr)   # only used to improve readability
n_codes <- 4L
set.seed(1L)
Postcode <- 
  stringr::str_dup(LETTERS[1:n_codes], 2L) %>% # create codes
  rep(times = sample(n_codes)) %>%             # replicate randomly
  sample()                                     # re-order randomly

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.