6

I have a data frame with more than 400.000 observations and I'm trying to add a column to it which its values depend on another column and sometimes multiple ones.

Here is a simpler example of what I'm trying to do :

# Creating a data frame 

M <- data.frame(c("A","B","C"),c(5,100,60))

names(M) <- c("Letter","Number")

#adding a column 

M$Size <- NA

# if Number <= 50 Size is small, 
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big

ifelse (M$Number <=50, M$Size <-"Small",
        ifelse(M$Number <= 70,
        M$Size <- "Medium",
        M$Size <- "Big"
        ))

When I run the Code, the output I get is :

[1] "Small"  "Big"    "Medium"

But the "Size" column in M is always the last condition in the ifelse function :

> print (M)
  Letter Number Size
1      A      5  Big
2      B    100  Big
3      C     60  Big

The Result that I want :

> print (M)
  Letter Number Size
1      A      5  Small
2      B    100  Big
3      C     60  Medium

I can solve the problem by subsetting each conditionsubset and using rbind to get the result I want but the code will be very long and since the original data frame I'm working on is big, it'll take more time to run. So I'm wondering how can I fix this issue ?

1
  • 1
    cut(M[, 2], c(-Inf, 50,70, Inf), c("Small", "Medium", "Big")) Commented May 19, 2016 at 10:56

4 Answers 4

13

This will help you out -

# Creating a data frame 

M <- data.frame(c("A","B","C"),c(5,100,60))

names(M) <- c("Letter","Number")

#adding a column 


# if Number <= 50 Size is small, 
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big

# M$Size[M$Number <= 50] <- "Small"
# Edit: No need to subset "Small"
M$Size <- "Small"
M$Size[M$Number >50 & M$Number<70] <- "Medium"
M$Size[M$Number > 70] <- "Big"

#      Letter Number   Size
# 1      A      5      Small
# 2      B    100      Big
# 3      C     60      Medium

See this on R-Fiddle

Sign up to request clarification or add additional context in comments.

2 Comments

This is better than using ifelse. Have an upvote. However, you don't need the subsetting in the first step if there are no NA values. Just do M$Size <- "Small".
@Roland Yes you are right, I oversaw that. Will edit
10

Use cut:

M$Size <- cut(M$Number, breaks = c(-Inf, 50, 70, Inf), 
                        labels = c("small", "medium", "large"))
#   etter Number   Size
#1      A      5  small
#2      B    100  large
#3      C     60 medium

Comments

6

Same idea but assign it like this instead. No package needed.

M$Size <- ifelse(M$Number <= 50, 'Small', ifelse(M$Number <= 70, 'Medium', 'Big'))

Result:

  Letter Number   Size
1      A      5  Small
2      B    100    Big
3      C     60 Medium

Comments

0

You can also try

df <- data.frame(c("A","B","C"),c(5,100,60))
names(df) <- c("Letter","Number")

df$Size = with(df,
               ifelse(Number <= 50, 'Small',
                     ifelse(Number > 50 & Number < 70, 'Medium',
                           ifelse(Number > 70, 'Big', 'NA'))))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.