25

I am trying to benchmark some code. I am sending a String msg over sockets. I want to send 100KB, 2MB, and 10MB String variables. Is there an easy way to create a variable of these sizes?

Currently I am doing this.

private static String createDataSize(int msgSize) {
    String data = "a";
    while(data.length() < (msgSize*1024)-6) {
        data += "a";
    }
    return data;
}

But this takes a very long time. Is there a better way?

UPDATE: Thanks, I am doing this now.

/**
 * Creates a message of size @msgSize in KB.
 */
private static String createDataSize(int msgSize) {
    // Java chars are 2 bytes
    msgSize = msgSize/2;
    msgSize = msgSize * 1024;
    StringBuilder sb = new StringBuilder(msgSize);
    for (int i=0; i<msgSize; i++) {
        sb.append('a');
    }
    return sb.toString();
  }
2
  • 1
    A detail: String.length() returns the number of characters in the string. How many bytes that will take up on the network also depends on which encoding you use. Commented Mar 19, 2010 at 1:53
  • 1
    Why char doesn't have to be 2 bytes, but only 1: stackoverflow.com/questions/5078314/… Commented Feb 12, 2015 at 13:03

5 Answers 5

39

You can simply create a large character array.

char[] data = new char[1000000];

If you need to make a real String object, you can:

String str = new String(data);

Don't use += to build strings in a loop. That has O(n²) memory and time usage, as String objects are immutable (so that each time you call +=, a new String object has to be made, copying the entire contents of the old string in the process).

Sign up to request clarification or add additional context in comments.

Comments

31

Use a char[] either directly, or to build the String.

char[] chars = new char[size];
Arrays.fill(chars, 'a');

String str = new String(chars);

Also note that one char uses up two bytes internally. How long the String will be over the wire depends on the encoding (the letter a should be just one byte, though).

1 Comment

Or you can use Arrays.fill(chars, 'a'). :-)
27

Java chars are 2 bytes (16 bits unsigned) in size. So if you want 2MB you need one million characters. There are two obvious issues with your code:

  1. Repeatedly calling length() is unnecessary. Add any character to a Java String and it's length goes up by 1, regardless of what the character is. Perhaps you're confusing this with the size in bytes. It doesn't mean that; and
  2. You have huge memory fragmentation issues with that code.

To further explain (2), the String concatenation operator (+) in Java causes a new String to be created because Java Strings are immutable. So:

String a = "a";
a += "b";

actually means:

String a = "a";
String a = a + "b";

This sometimes confuses former C++ programmers as strings work differently in C++.

So your code is actually allocating a million strings for a message size of one million. Only the last one is kept. The others are garbage that will be cleaned up but there is no need for it.

A better version is:

private static String createDataSize(int msgSize) {
  StringBuilder sb = new StringBuilder(msgSize);
  for (int i=0; i<msgSize; i++) {
    sb.append('a');
  }
  return sb.toString();
}

The key difference is that:

  1. A StringBuilder is mutable so doesn't need to be reallocated with each change; and
  2. The StringBuilder is preallocated to the right size in this code sample.

Note: the astute may have noticed I've done:

sb.append('a');

rather than:

sb.append("a");

'a' of course is a single character, "a" is a String. You could use either in this case.

However, it's not that simple because it depends on how the bytes are encoded. Typically unless you specify it otherwise it'll use UTF8, which is variable width characters. So one million characters might be anywhere from 1MB to 4MB in size depending on you end up encoding it and your question doesn't contain details of that.

If you need data of a specific size and that data doesn't matter, my advice would be to simply use a byte array of the right size.

2 Comments

Java strings are immutable, so each += actually creates a new string by copying the entire contents of the previous one. (I presume your "huge memory fragmentation issues" is an oblique reference to this.)
In the for loop I think you mean sb.append('a'); Code works great! Thank you.
4

If you are using Java 11 you can use String.repeat:

"a".repeat(20000);

Comments

0

yes, there is.. using a buffered string object:

StringBuilder stringB = new StringBuilder(2000000); //for the 2mb one
String paddingString = "abcdefghijklmnopqrs";

while (stringB.length() + paddingString.length() < 2000000)
 stringB.append(paddingString);

//use it
stringB.toString()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.