How Much Memory Does a String Use in Java 8

How much memory does a string use in Java 8?

Java7 or lower

Minimum String memory usage :

(bytes) = 8 * (int) ((((no chars) * 2) + 45) / 8)

So

80 = 8 * (int) ((((19) * 2) + 45) / 8)

Understanding String memory usage (SOURCE)

To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:

  • a char array— thus a separate object— containing the actual characters;
  • an integer offset into the array at which the string starts;
  • the length of the string;
  • another int for the cached calculation of the hash code.

This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no "padding" bytes are needed so far).

Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.

If the String contains, say, 19 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 19*2=38 bytes for the seventeen chars. Since 12+38=50 isn't a multiple of 8, we also need to round up to the next multiple of 8 (56). So overall, our 19-character String will use up 56+24 = 80 bytes.


Java8.

Java 8 does not have the offset and length anymore. Only hash and the CharArray.
@Thomas Jungblut

  • a char array— thus a separate object— containing the actual characters;
  • an integer offset into the array at which the string starts;
  • the length of the string;
  • another int for the cached calculation of the hash code.

So, in Java8 the way to calculate memory for strings remains same but you must subtract 8 bytes less due to the missing offset and length.

Java object memory size of a set of strings

Because I have no life, I present the results of boredom. Note that this is pretty much guaranteed to be inaccurate, due stupid mistakes and such. Used this for help, but I'm not too sure on accuracy. I could read the JVM specifications, but I don't have that much free time on my hands.

This calculation gets pretty complicated due to the multitude of fields that exist inside the objects of concern, plus some uncertainty on my part about how much overhead there is for objects and where padding goes. If memory serves, objects have 8 bytes reserved for the header. This is all for a 64-bit VM, by the way. Only difference between that and a 32-bit VM is the size of references, I think.

Summary of how to do this: Obtain source code, and recursively add up space needed for all fields. Need knowledge of how VM works and how implementations work.

Starting from a String. String defines:

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. int hash - 4 bytes + 4 bytes padding
  4. char[] value (set to a char[10] in your case) - 8 bytes for reference
  5. ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0] - 8 bytes for reference

char[10] defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes
  3. char x10 - 2 bytes * 10 = 20 bytes

ObjectStreamField[0] defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes + 4 bytes padding

Total for a single String with length 10: 88 bytes

Total for 1000 Strings with length 10: 88000 bytes.


HashSet defines:

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. Object PRESENT - 8 bytes
  4. HashMap<E, Object> map - 8 bytes

HashMap defines (in Java 8) (ignoring things that are created on demand, like EntrySet):

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. int DEFAULT_INITIAL_CAPACITY - 4 bytes
  4. int MAXIMUM_CAPACITY - 4 bytes
  5. int TREEIFY_THRESHOLD - 4 bytes
  6. int UNTREEIFY_THRESHOLD - 4 bytes
  7. int MIN_TREEIFY_CAPACITY - 4 bytes
  8. int size - 4 bytes
  9. int modcount - 4 bytes
  10. int threshold - 4 bytes
  11. float DEFAULT_LOAD_FACTOR - 4 bytes
  12. float loadFactor - 4 bytes
  13. Node<K, V>[] table - 8 bytes

Node defines:

  1. Object header - 8 bytes
  2. int hash - 4 bytes + 4 bytes padding
  3. K key - 8 bytes
  4. V value - 8 bytes
  5. Node<K, V> next - 8 bytes

Node<K, V>[] should have a size of 2048, if I remember how HashMap works. So it defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes + 4 bytes padding
  3. Node<K, V> reference * 2048 - 8 bytes * 2048 = 16384 bytes.

So the HashSet should be:

  1. 32 bytes for just HashSet
  2. 64 bytes for just HashMap
  3. 40 bytes per Node<K, V> inside Node<K, V>[] * 1000 nodes = 40000 bytes
  4. 16400 bytes for Node<K, V>[] inside the HashMap

Total: 56496 bytes for the HashSet, without taking into account the String contents


So at least by my calculations, the total space taken should be somewhere around 144496 bytes -- about 141 kilobytes (kibibytes for the pedantic). To be honest, this seems like it's more than a bit on the small side, but it's a start.

I can't get the Instrumentation interface working at the moment, so I can't double-check. But if someone knows what he/she is doing a comment pointing out my mistakes would be welcome.

How many bytes will a string take up?

From my article on strings:

In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down), where n is the number of characters in the string. The string type is unusual in that the size of the object itself varies. The only other classes which do this (as far as I know) are arrays. Essentially, a string is a character array in memory, plus the length of the array and the length of the string (in characters). The length of the array isn't always the same as the length in characters, as strings can be "over-allocated" within mscorlib.dll, to make building them up easier. (StringBuilder does this, for instance.) While strings are immutable to the outside world, code within mscorlib can change the contents, so StringBuilder creates a string with a larger internal character array than the current contents requires, then appends to that string until the character array is no longer big enough to cope, at which point it creates a new string with a larger array. The string length member also contains a flag in its top bit to say whether or not the string contains any non-ASCII characters. This allows for extra optimisation in some cases.

I suspect that was written before I had a chance to work with a 64-bit CLR; I suspect in 64-bit land each string takes up either 4 or 8 more bytes.

EDIT: I wrote up a blog post more recently which includes 64-bit information (and contradicts the above slightly for x86...)

Why is Java's String memory usage said to be high?

String storage in Java depends on how the string was obtained. The backing char array can be shared between multiple instances. If that isn't the case, you have the usual object overhead plus storage for one pointer and three ints which usually comes out to 16 bytes overhead. Then the backing array requires 2 bytes per char since chars are UTF-16 code units.

For "Apple Computers" where the backing array is not shared, the minimum cost is going to be

  1. backing array for 16 chars -- 32B which aligns nicely on a word boundary.
  2. pointer to array - 4 or 8B depending on the platform
  3. three ints for the offset, length, and memoized hashcode - 12B
  4. 2 x object overhead - depends on the VM, but 8B is a good rule of thumb.
  5. one int for the array length.

So roughly 72B of which the actual payload constitutes 44.4%. The payload constitutes more for longer strings.


In Java7, some JDK implementations are doing away with backing array sharing to avoid pinning large char[]s in memory. That allows them to do away with 2 of the three ints.

That changes the calculation to 64B for a string of length 16 of which the actual payload constitutes 50%.

ArrayListInteger vs ArrayListString - both storing values from 0 to 9 .. which takes more memory?

Here is a test result using this tool, it shows that ArrayList<Integer> occupies less memory:

public static void main(String[] args) {
ArrayList<Integer> integerArrayList = new ArrayList<>();
ArrayList<String> stringArrayList = new ArrayList<>();
for (int i = 0; i < 10; i++) {
integerArrayList.add(i);
stringArrayList.add(String.valueOf(i));
}
System.out.println(RamUsageEstimator.sizeOf(integerArrayList)); // 240
System.out.println(RamUsageEstimator.sizeOf(stringArrayList)); // 560
}


Related Topics



Leave a reply



Submit