36

I'm not sure how do this, as I'm pretty new to regular expressions, and can't seem to find the proper method to accomplish this but say I have the following as a string (all tabs, and newlines included)

1/2 cup  




            onion           
             (chopped)

How can I remove all the whitespace and replace each instance with just a single space?

6 Answers 6

68

This is a case where regular expressions work well, because you want to treat the whole class of whitespace characters the same and replace runs of any combination of whitespace with a single space character. So if that string is stored in s, then you would do:

fixed_string = s.gsub(/\s+/, ' ')
Sign up to request clarification or add additional context in comments.

Comments

27

Within Rails you can use String#squish, which is an active_support extensions.

require 'active_support'

s = <<-EOS
1/2 cup  

            onion
EOS

s.squish
# => 1/2 cup onion

Comments

8

You want the squeeze method:

str.squeeze([other_str]*) → new_str
Builds a set of characters from the other_str parameter(s) using the procedure described for String#count. Returns a new string where runs of the same character that occur in this set are replaced by a single character. If no arguments are given, all runs of identical characters are replaced by a single character.

   "yellow moon".squeeze                  #=> "yelow mon"
   "  now   is  the".squeeze(" ")         #=> " now is the"
   "putters shoot balls".squeeze("m-z")   #=> "puters shot balls"

4 Comments

This is not quite right, I think. It will just compress runs of tabs into a single tab and runs of newlines into a single newline. As I read it, the question is seeking a way to replace runs of any combination of whitespace characters with a single space character.
Too bad String.squeeze doesn't accept regex as an argument. Upvote if you think that would be a good idea; I could submit a PR.
also this removes duplicates, eg. "class" becomes "clas', which could be a deal breaker if running this on say html. the more proper method (if in rails) would be String#squish
As @engineerDave said. This produces: "1/2 cup \n onion \n (choped)\n"
6

The problem with the simplest solution gsub(/\s+/, ' ') is that it is very SLOW, as it replaces every space, even if it is single. But usually there is 1 space between words and we should fix only if there are 2 or more whitespaces in sequence.

Better solution is tr("\r\n\t", ' ').gsub(/ {2,}/, ' ') – first replace special whitespacing to ordinary spaces (tr works faster than gsub for replacing 1 char) and then squeeze spaces only if there are 2 or more consecutive spaces.

def method1(s) s.gsub!(/\s+/, ' '); s end
def method2(s) s.tr!("\r\n\t", ' '); s.gsub!(/ {2,}/, ' '); s end

Benchmark.bm do |x|
  n = 100_000
  x.report('method1') { n.times { method1("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('method2') { n.times { method2("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
end;1

#          user     system      total        real
# method1  2.907425   0.024254   2.931679 (  3.406144)
# method2  0.644329   0.011254   0.655583 (  0.658699)    

1 Comment

To any newer programmers out there: be advised that for the vast vast majority of use cases, the performance difference between this approach and the selected answer will not be noticeable or matter.
5

The selected answer will not remove non-breaking space characters.

This should work in 1.9:

fixed_string = s.gsub(/(\s|\u00A0)+/, ' ')

1 Comment

I'm assuming OP wanted to remove meaningless whitespace (i.e. runs of more than one whitespace character) to tidy up and reduce the size of strings intended for HTML, since multiple consecutive whitespace chars get collapsed into one space by web browsers anyway... So it's worth noting that non-breaking spaces are not meaningless in HTML, they don't get collapsed, i.e. you probably don't want to remove them
0

If speed is a concern then your best bet is this.

.tr("\r\n\t", ' ').gsub(/ {2,}/, ' ')

This replaces whitespace characters with a space then replaces multiple spaces with a single space.

I saw the benchmark that Lev posted and compared variations of gsub .sqeeze .tr and .squish. I expanded his benchmark to try them out and while .squeeze is the fastest it does not answer the questions since it would only compress multiple tabs/new lines to a singe tab/new line.

# Replace multiple whitespace characters with a single space.
def method1(s) s.gsub!(/\s+/, ' '); s end # (in place)
def method2(s) s = s.gsub(/\s+/, ' '); s end

# Replace characters with a space then replace multiple spaces with a single space.
def method3(s) s.gsub!(/[\r\n\t]/, ' '); s.gsub!(/ {2,}/, ' '); s end # (in place)
def method4(s) s = s.gsub(/[\r\n\t]/, ' ').gsub(/ {2,}/, ' '); s end

# Replace characters with a space then replace multiple spaces with a single space.
def method5(s) s.tr!("\r\n\t", ' '); s.gsub!(/ {2,}/, ' '); s end # (in place)
def method6(s) s = s.tr("\r\n\t", ' ').gsub(/ {2,}/, ' '); s end

# Replace multiple whitespace characters with a single space.
def method7(s) s.squish!; s end # (in place)
def method8(s) s = s.squish; s end

# Combines multiple spaces into a single space
def method9(s) s.squeeze!(" "); s end # (in place)
def method10(s) s = s.squeeze(" "); s end

Benchmark.bm do |x|
  n = 100_000
  x.report('.gsub!      ') { n.times { method1("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub       ') { n.times { method2("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub!.gsub!') { n.times { method3("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub .gsub ') { n.times { method4("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.tr!.gsub!  ') { n.times { method5("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.tr .gsub   ') { n.times { method6("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squish     ') { n.times { method7("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squish!    ') { n.times { method8("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squeeze!   ') { n.times { method9("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squeeze    ') { n.times { method10("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
end

Which gets these results

=>
#               user       system     total       real
# .gsub!        2.019544   0.030325   2.049869 (  2.059379)
# .gsub         1.968179   0.011204   1.979383 (  1.988050)
# .gsub!.gsub!  0.770042   0.014097   0.784139 (  0.787055)
# .gsub .gsub   0.728955   0.011577   0.740532 (  0.742887)
# .tr!.gsub!    0.487014   0.008260   0.495274 (  0.496820)
# .tr .gsub     0.487231   0.007769   0.495000 (  0.497164)
# .squish!      2.005224   0.011673   2.016897 (  2.025851)
# .squish       2.043497   0.013331   2.056828 (  2.066794)
# .squeeze!     0.117615   0.002004   0.119619 (  0.120140)
# .squeeze      0.196301   0.012094   0.208395 (  0.209267)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.