4

I have a dataframe below:

Country        Population
123491         9.9
2348           4.3
USA            10.1
Australia      9.1

And I want to remove the rows where the Country is invalid, for example 123491 and 2348. The class of Country is "factor".

> sapply(df, class)

Country        Population
factor          numeric

I want to get the following as a result:

Country       Population
  USA          10.1
  Australia    9.1
1

3 Answers 3

2

You could look for numbers as part of the Country column and exclude those that contain numbers.

library(tidyverse)

Country <- factor(c("123491", "2348", "USA", "Australia"))
Population <- c(9.9, 4.3, 10.1, 9.1)

df <- data.frame(Country, Population)

df %>%
  filter(!(str_detect(Country, "\\d")))
Sign up to request clarification or add additional context in comments.

Comments

2

You could subset your data frame using grepl:

df[!grepl("^\\d+$", df$Country), ]

    Country Population
3       USA       10.1
4 Australia        9.1

Data:

df <- data.frame(Country=c("123491", "2348", "USA", "Australia"),
                 Population=c(9.9, 4.3, 10.1, 9.1))

Note: If you want to reject a country based on having any number in it, then just use grepl with the pattern \d:

df[!grepl("\\d", df$Country), ]

Comments

0

It can be done by checking whether the value can be converted to numeric, as follows:

suppressWarnings(df[is.na(as.numeric(as.character(df$Country))),])

#    Country Population
#3       USA       10.1
#4 Australia        9.1

So, if the conversion to numeric produced NA, that means the value is of a character type. I used suppressWarnings so you don't get a warning when converting characters to numeric values.

Hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.