1

Through the process of learning R and following up with my previous question and the answer, I am trying to figure it out how to do the sequential string replace in a data.frame in R.

Considering the mtcars dataset, i'd like to define values for the mtcars$hp as (hp <100, hp >= 100 & hp <200, hp >200) to be labeled as ("low", "medium" and "high"), respectively.

Theoretically, the following sequentially functions should do the job :

  1-  mtcars$hp[mtcars$hp <100] = "low"
   2-  mtcars$hp[mtcars$hp >=100 & mtcars$hp <200] = "medium"
    3- mtcars$hp[mtcars$hp >= 200] = "high"

Running functions 1 and 2 everything goes well.

> head(mtcars, 20)
                     mpg cyl  disp     hp drat    wt  qsec vs am gear carb newcol
Mazda RX4           21.0   6 160.0 medium 3.90 2.620 16.46  0  1    4    4    low
Mazda RX4 Wag       21.0   6 160.0 medium 3.90 2.875 17.02  0  1    4    4    low
Datsun 710          22.8   4 108.0    low 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive      21.4   6 258.0 medium 3.08 3.215 19.44  1  0    3    1    low
Hornet Sportabout   18.7   8 360.0 medium 3.15 3.440 17.02  0  0    3    2 medium
Valiant             18.1   6 225.0 medium 2.76 3.460 20.22  1  0    3    1    low
Duster 360          14.3   8 360.0    245 3.21 3.570 15.84  0  0    3    4 medium
Merc 240D           24.4   4 146.7    low 3.69 3.190 20.00  1  0    4    2   high
Merc 230            22.8   4 140.8    low 3.92 3.150 22.90  1  0    4    2   high
Merc 280            19.2   6 167.6 medium 3.92 3.440 18.30  1  0    4    4    low
Merc 280C           17.8   6 167.6 medium 3.92 3.440 18.90  1  0    4    4    low
Merc 450SE          16.4   8 275.8 medium 3.07 4.070 17.40  0  0    3    3 medium
Merc 450SL          17.3   8 275.8 medium 3.07 3.730 17.60  0  0    3    3 medium
Merc 450SLC         15.2   8 275.8 medium 3.07 3.780 18.00  0  0    3    3 medium
Cadillac Fleetwood  10.4   8 472.0    205 2.93 5.250 17.98  0  0    3    4 medium
Lincoln Continental 10.4   8 460.0    215 3.00 5.424 17.82  0  0    3    4 medium
Chrysler Imperial   14.7   8 440.0    230 3.23 5.345 17.42  0  0    3    4 medium
Fiat 128            32.4   4  78.7    low 4.08 2.200 19.47  1  1    4    1   high
Honda Civic         30.4   4  75.7    low 4.93 1.615 18.52  1  1    4    2   high
Toyota Corolla      33.9   4  71.1    low 4.22 1.835 19.90  1  1    4    1   high

However, as soon as I run function 3 mtcars$hp[mtcars$hp >= 200] = "high" following the last two option all of the hp will turn into "high"!

> mtcars$hp[mtcars$hp >= 200] = "high"
> head(mtcars, 20)
                     mpg cyl  disp   hp drat    wt  qsec vs am gear carb newcol
Mazda RX4           21.0   6 160.0 high 3.90 2.620 16.46  0  1    4    4    low
Mazda RX4 Wag       21.0   6 160.0 high 3.90 2.875 17.02  0  1    4    4    low
Datsun 710          22.8   4 108.0 high 3.85 2.320 18.61  1  1    4    1   high
Hornet 4 Drive      21.4   6 258.0 high 3.08 3.215 19.44  1  0    3    1    low
Hornet Sportabout   18.7   8 360.0 high 3.15 3.440 17.02  0  0    3    2 medium
Valiant             18.1   6 225.0 high 2.76 3.460 20.22  1  0    3    1    low
Duster 360          14.3   8 360.0 high 3.21 3.570 15.84  0  0    3    4 medium
Merc 240D           24.4   4 146.7 high 3.69 3.190 20.00  1  0    4    2   high
Merc 230            22.8   4 140.8 high 3.92 3.150 22.90  1  0    4    2   high
Merc 280            19.2   6 167.6 high 3.92 3.440 18.30  1  0    4    4    low
Merc 280C           17.8   6 167.6 high 3.92 3.440 18.90  1  0    4    4    low
Merc 450SE          16.4   8 275.8 high 3.07 4.070 17.40  0  0    3    3 medium
Merc 450SL          17.3   8 275.8 high 3.07 3.730 17.60  0  0    3    3 medium
Merc 450SLC         15.2   8 275.8 high 3.07 3.780 18.00  0  0    3    3 medium
Cadillac Fleetwood  10.4   8 472.0 high 2.93 5.250 17.98  0  0    3    4 medium
Lincoln Continental 10.4   8 460.0 high 3.00 5.424 17.82  0  0    3    4 medium
Chrysler Imperial   14.7   8 440.0 high 3.23 5.345 17.42  0  0    3    4 medium
Fiat 128            32.4   4  78.7 high 4.08 2.200 19.47  1  1    4    1   high
Honda Civic         30.4   4  75.7 high 4.93 1.615 18.52  1  1    4    2   high
Toyota Corolla      33.9   4  71.1 high 4.22 1.835 19.90  1  1    4    1   high 

Any idea why and what I am doing wrong?
Thanks!

3
  • 1
    mtcars$hp[mtcars$hp >=200 & mtcars$hp <"low"] = "high" Commented Feb 17, 2017 at 21:10
  • @HubertL thank you very much! would u please also explain to me why and what is a logic behind it (ur function)? Commented Feb 17, 2017 at 21:15
  • 1
    when you assign text ("low") to hp it becomes of type character then you assign "medium" then all values are over 200 but also "medium" is over 200 and "low" too so all values are over 200 and get assigned "high". My solution just ensures that you don't assign "high" to values that are "low" and "medium". Try "medium" > "low" Commented Feb 17, 2017 at 21:23

1 Answer 1

1

You should use a temporary vector

res <- character(length(mtcars$hp))
res[mtcars$hp <100] <- "low"
res[mtcars$hp >=100 & mtcars$hp <200] <- "medium"
res[mtcars$hp >= 200] <- "high"
mtcars$hp <- res

Otherwise, you will have altered the basis of comparison with the first assignment

df <- mtcars
class(df$hp)
#> [1] "numeric"
df$hp[df$hp <100] <- "low"
class(df$hp)
#> [1] "character"

and subsequent comparisons will all be string based, not numeric!

Alternatively, you can do this all at once with the right tool, cut

cut(mtcars$hp,c(-Inf,100,200,Inf),c("low","medium","high"),right=FALSE)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.