4

Consider the following two code blocks, http://ideone.com/3nNdVs

String[] matches = new String[] {"Foo", "Bar"};
long start = System.nanoTime();
for(int i=0; i< 1000000; i++) {
    String name = "This String is Foo Bar";
    for (String s : matches){
        name = name.replace(s, "");
    }
}
System.out.println((System.nanoTime() - start)/1000000);

and http://ideone.com/v8wg6m

matches = {"Foo", "Bar"}
start = time.time()
for x in xrange(1000000):
    name = "This String is Foo Bar"
    for s in matches:
        name = name.replace(s, "")
print time.time() - start

While trying to benchmark the performance of these two, I found that the one implemented in Java takes about 50% longer than the Python. This came as quite a shock to me as i was expecting the Python version to be slower.

So the first question is, are there better or faster ways to perform these two functions?

Second, if not, why is the Java version slower than the Python one?

13
  • 3
    "Are these String operations equivilent in Java and Python?" Well...no...the Python version does replacements (matches has "Foo" and "Bar"), the Java version doesn't (matches does not have "Foo" or "Bar"). Commented Mar 21, 2014 at 7:29
  • 1
    btw, iterations on set is slow than list and tuples, and your codes are not equivalent. Commented Mar 21, 2014 at 7:30
  • 3
    matches = {"Foo", "Bar"} is a set; you almost certainly want a list: matches = ["Foo", "Bar"]. Commented Mar 21, 2014 at 7:32
  • Languages don't have speed, they have only semantics. If you want to compare speed you must choose specific implementations to compare. If you want to do some performance testing to see a bit more under the hood on *nix based computers you can use perf stat -B (sudo apt-get install linux-tools-common linux-base) Commented Mar 21, 2014 at 7:32
  • 2
    I'm going to link to this, because it seems relevant: ericlippert.com/2012/12/17/performance-rant Commented Mar 21, 2014 at 7:42

2 Answers 2

2

I found out the reason that python was quicker, it is because the .replace method in java uses regex which is compiled every time you call .replace.

there are many quicker alternatives, but the one that i found to be most convinient is to use org.apache.commons.lang3.StringUtils library's .replaceEach which uses index of to find and replace substrings which i understand is still faster than a one time compiled regex.

long start = System.nanoTime();
for(int i=0; i< 1000000; i++) {
    String name = "This String is Foo Bar";
    name = StringUtils.replaceEach(name, matches, replaces);
}
System.out.println((System.nanoTime() - start)/1000000);

unfortunatly i cany provide a link on ide one as they dont have apache commons.

This version of the algorithm on my system was about 1/4 faster than the .replace method and about 1/2 faster than the python.

if anyone has a faster option for python let me know

thanks

Sign up to request clarification or add additional context in comments.

2 Comments

That's definitely why the Java one is slow. If you knew at compile time what the regex's you wanted to replace were, you could use Pattern.compile("Foo"), Pattern.compiile("Bar") to save some time.
@theSilentOne yep your right, but i think the indexof solution would still be faster than the compiled regex. if the strings were longer or more complex, then regex would defiantly be the way to go
-1

For Python use the timeit module:

import timeit

setup = """
matches = {'Foo', 'Bar'}
for x in xrange(1000000):
  name = 'This String is Foo Bar'
  for s in matches:
    name = name.replace(s, '')
"""

print min(timeit.Timer(setup=setup).repeat(10))

4 Comments

sorry Jeff, but i dont know that that really helps me get any closer to an answer.
@user779420: Jeff is perfectly right in one aspect: You're measuring some timing having no relevance whatsoever. In Java you need something like caliper or JMH when you want to get figures related to real performance.
strings are immutable. Every time you do a loop you are creating copies of the string. s.replace creates a copy also. The following is consistently performing better: ''.join([x for x in 'This String is Foo Bar'.split() if x not in ('Foo', 'Bar')])
@Jeff Bond Actually it seems to me that the replace is faster than the join ideone.com/MCrDGs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.