212

So " xx yy 11 22 33 " will become "xxyy112233". How can I achieve this?

0

9 Answers 9

320

In general, we want a solution that is vectorised, so here's a better test example:

whitespace <- " \t\n\r\v\f" # space, tab, newline, 
                            # carriage return, vertical tab, form feed
x <- c(
  " x y ",           # spaces before, after and in between
  " \u2190 \u2192 ", # contains unicode chars
  paste0(            # varied whitespace     
    whitespace, 
    "x", 
    whitespace, 
    "y", 
    whitespace, 
    collapse = ""
  ),   
  NA                 # missing
)
## [1] " x y "                           
## [2] " ← → "                           
## [3] " \t\n\r\v\fx \t\n\r\v\fy \t\n\r\v\f"
## [4] NA

The base R approach: gsub

gsub replaces all instances of a string (fixed = TRUE) or regular expression (fixed = FALSE, the default) with another string. To remove all spaces, use:

gsub(" ", "", x, fixed = TRUE)
## [1] "xy"                            "←→"             
## [3] "\t\n\r\v\fx\t\n\r\v\fy\t\n\r\v\f" NA 

As DWin noted, in this case fixed = TRUE isn't necessary but provides slightly better performance since matching a fixed string is faster than matching a regular expression.

If you want to remove all types of whitespace, use:

gsub("[[:space:]]", "", x) # note the double square brackets
## [1] "xy" "←→" "xy" NA 

gsub("\\s", "", x)         # same; note the double backslash

library(regex)
gsub(space(), "", x)       # same

"[:space:]" is an R-specific regular expression group matching all space characters. \s is a language-independent regular-expression that does the same thing.


The stringr approach: str_replace_all and str_trim

stringr provides more human-readable wrappers around the base R functions (though as of Dec 2014, the development version has a branch built on top of stringi, mentioned below). The equivalents of the above commands, using [str_replace_all][3], are:

library(stringr)
str_replace_all(x, fixed(" "), "")
str_replace_all(x, space(), "")

stringr also has a str_trim function which removes only leading and trailing whitespace.

str_trim(x) 
## [1] "x y"          "← →"          "x \t\n\r\v\fy" NA    
str_trim(x, "left")    
## [1] "x y "                   "← → "    
## [3] "x \t\n\r\v\fy \t\n\r\v\f" NA     
str_trim(x, "right")    
## [1] " x y"                   " ← →"    
## [3] " \t\n\r\v\fx \t\n\r\v\fy" NA      

The stringi approach: stri_replace_all_charclass and stri_trim

stringi is built upon the platform-independent ICU library, and has an extensive set of string manipulation functions. The equivalents of the above are:

library(stringi)
stri_replace_all_fixed(x, " ", "")
stri_replace_all_charclass(x, "\\p{WHITE_SPACE}", "")

Here "\\p{WHITE_SPACE}" is an alternate syntax for the set of Unicode code points considered to be whitespace, equivalent to "[[:space:]]", "\\s" and space(). For more complex regular expression replacements, there is also stri_replace_all_regex.

stringi also has trim functions.

stri_trim(x)
stri_trim_both(x)    # same
stri_trim(x, "left")
stri_trim_left(x)    # same
stri_trim(x, "right")  
stri_trim_right(x)   # same
Sign up to request clarification or add additional context in comments.

8 Comments

@Aniko. Is there a reason you used fixed=TRUE?
@DWin Supposedly it is faster if R knows that it does not have to invoke the regular expression stuff. In this case it does not really make any difference, I am just in the habit of doing so.
Is there a difference between "[[:space:]]" and "\\s"?
if you check on flyordie.sin.khk.be/2011/05/04/day-35-replacing-characters or just type in ?regex then you see that [:space:] is used for "Space characters: tab, newline, vertical tab, form feed, carriage return, and space." That's a lot more than space alone
@Aniko Hope you don't mind about the big edit. Since this question is highly popular, it looked like the answer needed to be more thorough.
|
33

I just learned about the "stringr" package to remove white space from the beginning and end of a string with str_trim( , side="both") but it also has a replacement function so that:

a <- " xx yy 11 22 33 " 
str_replace_all(string=a, pattern=" ", repl="")

[1] "xxyy112233"

1 Comment

stringr package doesn't work well with every encoding. stringi package is better solution, for more info check github.com/Rexamine/stringi
23
x = "xx yy 11 22 33"

gsub(" ", "", x)

> [1] "xxyy112233"

Comments

22

Use [[:blank:]] to match any kind of horizontal white_space characters.

gsub("[[:blank:]]", "", " xx yy 11 22  33 ")
# [1] "xxyy112233"

Comments

10

Please note that soultions written above removes only space. If you want also to remove tab or new line use stri_replace_all_charclass from stringi package.

library(stringi)
stri_replace_all_charclass("   ala \t  ma \n kota  ", "\\p{WHITE_SPACE}", "")
## [1] "alamakota"

3 Comments

stringi package is on CRAN now, enjoy! :)
This command above is incorrect. The right way is stri_replace_all_charclass(" ala \t ma \n kota ", "\\p{WHITE_SPACE}", "")
After using stringi for a few months now and seen/learned how powerful and efficient it is, it has become my go-to package for string operations. You guys did an awesome job with it.
10

The function str_squish() from package stringr of tidyverse does the magic!

library(dplyr)
library(stringr)

df <- data.frame(a = c("  aZe  aze s", "wxc  s     aze   "), 
                 b = c("  12    12 ", "34e e4  "), 
                 stringsAsFactors = FALSE)
df <- df %>%
  rowwise() %>%
  mutate_all(funs(str_squish(.))) %>%
  ungroup()
df

# A tibble: 2 x 2
  a         b     
  <chr>     <chr> 
1 aZe aze s 12 12 
2 wxc s aze 34e e4

3 Comments

Please do not link to code. Add it in the text body of your answer and explain it here, to give your answer more longterm value.
Thanks @RBalasubramanian for reminding me of this guideline. I will follow it in the future.
I don't see how this answers the question. str_squish doesn't remove all spaces. It just trims and substitutes multiple spaces for one.
6

Another approach can be taken into account

library(stringr)
str_replace_all(" xx yy 11 22  33 ", regex("\\s*"), "")

#[1] "xxyy112233"

\\s: Matches Space, tab, vertical tab, newline, form feed, carriage return

*: Matches at least 0 times

1 Comment

Should be noted that the regex() command also comes from the stringr package.
5
income<-c("$98,000.00 ", "$90,000.00 ", "$18,000.00 ", "")

To remove space after .00 use the trimws() function.

income<-trimws(income)

Comments

0

From stringr library you could try this:

  1. Remove consecutive fill blanks
  2. Remove fill blank

    library(stringr)

                2.         1.
                |          |
                V          V
    
        str_replace_all(str_trim(" xx yy 11 22  33 "), " ", "")
    

1 Comment

only removes spaces and maybe tabs, not new lines

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.