How do I remove repeated spaces in a string?

Question

I have a string:

"foo (2 spaces) bar (3 spaces) baaar (6 spaces) fooo"

How do I remove repetitious spaces in it so there should be no more than one space between any two words?

You know, this kind of question is easily answered by reviewing all the String methods. I highly recommend getting familiar with the documentation for the String, Array, and Enumerable methods. — Mark Thomas
– Mark Thomas, Commented Feb 5, 2011 at 15:41
In case you don't know where to start, visit http://ruby-doc.org/ and then click on the Core API link and then click on the String class in the top middle column. — Phrogz
– Phrogz, Commented Feb 5, 2011 at 15:51
To the OP's defense, removing the spaces can be accomplished several ways, not all of which are the most intuitive, especially when you look at the benchmark results. — the Tin Man
– the Tin Man, Commented Dec 30, 2011 at 19:19

Nakilon · Accepted Answer · 2020-12-19 23:49:28Z

107

String#squeeze has an optional parameter to specify characters to squeeze.

irb> "asd  asd asd   asd".squeeze(" ")
=> "asd asd asd asd"

Warning: calling it without a parameter will 'squezze' ALL repeated characters, not only spaces:

irb> 'aaa     bbbb     cccc 0000123'.squeeze
=> "a b c 0123"

edited Dec 19, 2020 at 23:49

answered Feb 5, 2011 at 13:46

Nakilon

35.2k16 gold badges112 silver badges149 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kurumi · Accepted Answer · 2011-02-05 14:10:11Z

51

>> str = "foo  bar   bar      baaar"
=> "foo  bar   bar      baaar"
>> str.split.join(" ")
=> "foo bar bar baaar"
>>

answered Feb 5, 2011 at 14:10

kurumi

25.7k5 gold badges47 silver badges52 bronze badges

5 Comments

Phrogz Over a year ago

+1 For an amusing way to do it, but -1 for an inefficient suggestion compared to other, more appropriate alternatives.

zetetic Over a year ago

see stackoverflow.com/questions/4907068/… :)

kurumi Over a year ago

@zetetic. Thanks. This further proofs that split/join is not an amusing or inefficient way , ( as I have always known ) than regex substitution.

nurettin Over a year ago

@kurumi it is amusing and inefficient unless you have small strings to work on. For the articles I'm working on, squeeze ' ' is an order of magnitude faster.

Bernhard Over a year ago

This method also removes leading and trailing spaces which might not be intended

the Tin Man · Accepted Answer · 2014-05-29 20:17:19Z

29

Updated benchmark from @zetetic's answer:

require 'benchmark'
include Benchmark

string = "foo  bar   bar      baaar"
n = 1_000_000
bm(12) do |x|
  x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
  x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
  x.report("split/join")   { n.times { string.split.join(" ") } }
end

Which results in these values when run on my desktop after running it twice:

ruby test.rb; ruby test.rb
                  user     system      total        real
gsub          6.060000   0.000000   6.060000 (  6.061435)
squeeze(' ')  4.200000   0.010000   4.210000 (  4.201619)
split/join    3.620000   0.000000   3.620000 (  3.614499)
                  user     system      total        real
gsub          6.020000   0.000000   6.020000 (  6.023391)
squeeze(' ')  4.150000   0.010000   4.160000 (  4.153204)
split/join    3.590000   0.000000   3.590000 (  3.587590)

The issue is that squeeze removes any repeated character, which results in a different output string and doesn't meet the OP's need. squeeze(' ') does meet the needs, but slows down its operation.

string.squeeze
 => "fo bar bar bar"

I was thinking about how the split.join could be faster and it didn't seem like that would hold up in large strings, so I adjusted the benchmark to see what effect long strings would have:

require 'benchmark'
include Benchmark

string = (["foo  bar   bar      baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(12) do |x|
  x.report("gsub      ")   { n.times { string.gsub(/\s+/, " ") } }
  x.report("squeeze(' ')") { n.times { string.squeeze(' ') } }
  x.report("split/join")   { n.times { string.split.join(" ") } }
end

ruby test.rb ; ruby test.rb

String length: 250000 characters
                  user     system      total        real
gsub          2.570000   0.010000   2.580000 (  2.576149)
squeeze(' ')  0.140000   0.000000   0.140000 (  0.150298)
split/join    1.400000   0.010000   1.410000 (  1.396078)

String length: 250000 characters
                  user     system      total        real
gsub          2.570000   0.010000   2.580000 (  2.573802)
squeeze(' ')  0.140000   0.000000   0.140000 (  0.150384)
split/join    1.400000   0.010000   1.410000 (  1.397748)

So, long lines do make a big difference.

If you do use gsub then gsub/\s{2,}/, ' ') is slightly faster.

Not really. Here's a version of the benchmark to test just that assertion:

require 'benchmark'
include Benchmark

string = "foo  bar   bar      baaar"
puts string.gsub(/\s+/, " ")
puts string.gsub(/\s{2,}/, ' ')
puts string.gsub(/\s\s+/, " ")

string = (["foo  bar   bar      baaar"] * 10_000).join
puts "String length: #{ string.length } characters"
n = 100
bm(18) do |x|
  x.report("gsub")               { n.times { string.gsub(/\s+/, " ") } }
  x.report('gsub/\s{2,}/, "")')  { n.times { string.gsub(/\s{2,}/, ' ') } }
  x.report("gsub2")              { n.times { string.gsub(/\s\s+/, " ") } }
end
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> foo bar bar baaar
# >> String length: 250000 characters
# >>                          user     system      total        real
# >> gsub                 1.380000   0.010000   1.390000 (  1.381276)
# >> gsub/\s{2,}/, "")    1.590000   0.000000   1.590000 (  1.609292)
# >> gsub2                1.050000   0.010000   1.060000 (  1.051005)

If you want speed, use gsub2. squeeze(' ') will still run circles around a gsub implementation though.

edited May 29, 2014 at 20:17

answered Dec 30, 2011 at 18:14

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

6 Comments

the Tin Man Over a year ago

@zetetic, I think Benchmark is an essential tool. I can't count how many times I've assumed something would be the fastest way to do a particular task, and had benchmark prove me wrong. I'd never had considered split/join to be fastest, though I've used it in apps for this purpose.

the Tin Man Over a year ago

@zetetic, check out the added test results.

TJChambers Over a year ago

My inference is if you avoided the interpolation in join(" ") by using join(' ') it should be even (immeasureably?) faster.

the Tin Man Over a year ago

Nope. We've tested that before, and it makes no difference. Strings, whether defined using double-quotes or single-quotes, are defined as the code is initially parsed by the interpreter at startup, not on the fly. The only time it could make a difference is if there are values being interpolated into the string at run-time.

the Tin Man Over a year ago

While it might seem so, benchmarks don't bear that out. Looking for "2 or more" takes longer than "1 or more". See the added benchmark.

|

tokland · Accepted Answer · 2022-04-08 22:44:21Z

28

Important note: this is an answer for Ruby on Rails, not plain ruby (both Activesupport and Facets are part of Rails gem)

To complement the other answers, note that both [Activesupport][1] and [Facets][1] provide [String#squish][2] ([update] caveat: it also removes newlines within the string):

>> "foo  bar   bar      baaar".squish
=> "foo bar bar baaar"

function [1]: http://www.rubydoc.info/docs/rails/2.3.8/ActiveSupport/CoreExtensions/String/Filters#squish-instance_method [2]: http://www.rubydoc.info/github/rubyworks/facets/String%3Asquish

edited Apr 8, 2022 at 22:44

answered Feb 5, 2011 at 13:40

tokland

68.2k13 gold badges151 silver badges174 bronze badges

1 Comment

Chiperific Over a year ago

Holy cow. I literally threw up my arms in amazement. I just replaced this: str.tr("\r","").tr("\n", "").tr("\t", "").squeeze(" ") with this: str.squish

Reiner Gerecke · Accepted Answer · 2011-02-05 13:30:11Z

9

Use a regular expression to match repeating whitespace (\s+) and replace it by a space.

"foo    bar  foobar".gsub(/\s+/, ' ')
=> "foo bar foobar"

This matches every whitespace, as you only want to replace spaces, use / +/ instead of /\s+/.

"foo    bar  \nfoobar".gsub(/ +/, ' ')
=> "foo bar \nfoobar"

answered Feb 5, 2011 at 13:30

Reiner Gerecke

12.3k2 gold badges51 silver badges41 bronze badges

Comments

zetetic · Accepted Answer · 2011-02-06 05:07:14Z

5

Which method performs better?

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]

$ cat squeeze.rb 
require 'benchmark'
include Benchmark

string = "foo  bar   bar      baaar"
n = 1_000_000
bm(6) do |x|
  x.report("gsub      ") { n.times { string.gsub(/\s+/, " ") } }
  x.report("squeeze   ") { n.times { string.squeeze } }
  x.report("split/join") { n.times { string.split.join(" ") } }
end

$ ruby squeeze.rb 
            user     system      total        real
gsub        4.970000   0.020000   4.990000 (  5.624229)
squeeze     0.600000   0.000000   0.600000 (  0.677733)
split/join  2.950000   0.020000   2.970000 (  3.243022)

answered Feb 6, 2011 at 5:07

zetetic

47.6k10 gold badges114 silver badges119 bronze badges

1 Comment

the Tin Man Over a year ago

this benchmark is not quite correct. string.squeeze => "fo bar bar bar" which is stripping any repeated character. Changing to string.squeeze(' ') results in times that put it solidly between gsub and split.join(' '), with the last being the fastest. See my answer for the updated benchmark code.

the Tin Man · Accepted Answer · 2014-05-01 20:45:40Z

3

Just use gsub and regexp. For example:

str = "foo  bar   bar      baaar"
str.gsub(/\s+/, " ")

will return new string or you can modify str directly using gsub!.

BTW. Regexp are very useful - there are plenty resources in the internet, for testing your own regexpes try rubular.com for example.

edited May 1, 2014 at 20:45

the Tin Man

161k44 gold badges222 silver badges308 bronze badges

answered Feb 5, 2011 at 13:35

jmatraszek

8085 silver badges14 bronze badges

Collectives™ on Stack Overflow

How do I remove repeated spaces in a string?

7 Answers 7

Comments

5 Comments

6 Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

5 Comments

6 Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related