2

In R i have a column in a dataframe which contains the City names. As shown in the below image.

enter image description here

This contains some erroneous data for example the Data N, Z, X needs to be replaced as "Others" and some city codes need to be replaced by their original names for example

OC, Okl City --> Oklahoma City
LA --> Los Angles
NW --> New York

When i tried doing this by using IF and ELSE IF statements inside a FOR Loop. I was very much Unsuccessful.

It will be of great help if someone can help me on this.

Thanks in Advance.

2
  • 1
    Please provide data as plain text using e.g. dput() or head(), not images which we cannot copy/paste. Commented Mar 19, 2019 at 4:12
  • @neilfws Thanks for the Tip bro :) Will make sure in the future i follow it. Commented Mar 19, 2019 at 22:46

3 Answers 3

3

Here's a reproducible example using dplyr::case_when() that you can generalize to any number of conditions:

library(tidyverse)
d <- tibble(city = c("Oklahoma City","Los Angeles","OC","NY","Z","Z","X","N"))
d <- mutate(d, city = case_when(city %in% c("Z","X","N") ~ "Other", 
                                city == "Oklahoma City"  ~ "OKL",
                                city == "Los Angeles"    ~ "LA",
                                TRUE ~ city))
d


# A tibble: 8 x 1
  city 
  <chr>
1 OKL  
2 LA   
3 OC   
4 NY   
5 Other
6 Other
7 Other
8 Other
Sign up to request clarification or add additional context in comments.

5 Comments

Hi Pauloo, Thanks for the response. This has given a New perspective on how to handle the problem. But i want the city codes to be also printed as City names. For example if its "NW" then need to print it as New York and if its OKL orOkl City. I wanted it to be printed as Oklahoma City.
You're looking for a generalized solution then. See my edited answer.
Thanks Pauloo. I think this will solve the problem :) I have one more query. For example the inputs "Z","X","N" are Random and fake inputs. So i think hard coding them will be a problem because in the future there might be different fake inputs. If the Input are standard city names and codes then output should be the same as result or if that input is a Fake one (which is Unpredictable) then it should be Categorized as "Other".
I'm not exactly sure what you mean @DavidChris, but for this code to work you need to know what you want to classify into one group or another.
The data is input data entered by people for a particular survey.There might be some cases where people might input wrong/incorrect data. For instance NY can be identified/understood Input as New York. But for Z,X,C we dunno what they were trying to input. So It needs to be categorized as "Others". In future when people are entering data there might be more incorrect data of different types. So hard coding it this way " city %in% c("Z","X","N") " might lead to a situation where whenever there is a New incorrect data we need to change the code.Is there any other way to overcome this problem.
1

Make use of the revalue from the plyr package.

library(plyr)

df$city<-revalue(df$city,c("OC"="Oklahoma City",
                             "Okl City"="Oklahoma City",
                             "LA"="Los Angles",
                             "NW"="New York",
                             "Z"="Others",
                             "X"="Others",
                             "N"="Others"))

1 Comment

Hi Sathish, Thanks for the response. I have learnt how to use the function revalue from your answer today :) But with respect to my expected output i am very unsure what will be the entries(Input) with respect to "Others". The values Z, X, N were just examples to tell that they are not correct Inputs. In the future i might get more Random variables which i need to categories as "Others".
0

Use can case when similar to @Rich above answer but differentiating not in condition.

library(tidyverse) d <- tibble(city = c("Oklahoma City","Los Angeles","OC","NY","Z","Z","X","N")) d <- mutate(d, city = case_when(!city %in% c("Oklahoma City", "Los Angeles" ) ~ "Other", city == "Oklahoma City" ~ "OKL", city == "Los Angeles" ~ "LA", TRUE ~ city))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.