Memory Size of a Hash or Other Object

Memory size of a hash or other object?

ObjectSpace.memsize_of does work in 1.9.3, documented or not:

puts RUBY_VERSION #=>1.9.3

require 'objspace'

p ObjectSpace.memsize_of("a"*23) #=> 23
p ObjectSpace.memsize_of("a"*24) #=> 24
p ObjectSpace.memsize_of("a".*1000) #=> 1000
h = {"a"=>1, "b"=>2}
p ObjectSpace.memsize_of(h) #=> 116

Java object memory size of a set of strings

Because I have no life, I present the results of boredom. Note that this is pretty much guaranteed to be inaccurate, due stupid mistakes and such. Used this for help, but I'm not too sure on accuracy. I could read the JVM specifications, but I don't have that much free time on my hands.

This calculation gets pretty complicated due to the multitude of fields that exist inside the objects of concern, plus some uncertainty on my part about how much overhead there is for objects and where padding goes. If memory serves, objects have 8 bytes reserved for the header. This is all for a 64-bit VM, by the way. Only difference between that and a 32-bit VM is the size of references, I think.

Summary of how to do this: Obtain source code, and recursively add up space needed for all fields. Need knowledge of how VM works and how implementations work.

Starting from a String. String defines:

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. int hash - 4 bytes + 4 bytes padding
  4. char[] value (set to a char[10] in your case) - 8 bytes for reference
  5. ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0] - 8 bytes for reference

char[10] defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes
  3. char x10 - 2 bytes * 10 = 20 bytes

ObjectStreamField[0] defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes + 4 bytes padding

Total for a single String with length 10: 88 bytes

Total for 1000 Strings with length 10: 88000 bytes.


HashSet defines:

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. Object PRESENT - 8 bytes
  4. HashMap<E, Object> map - 8 bytes

HashMap defines (in Java 8) (ignoring things that are created on demand, like EntrySet):

  1. Object header - 8 bytes
  2. long serialVersionUID - 8 bytes
  3. int DEFAULT_INITIAL_CAPACITY - 4 bytes
  4. int MAXIMUM_CAPACITY - 4 bytes
  5. int TREEIFY_THRESHOLD - 4 bytes
  6. int UNTREEIFY_THRESHOLD - 4 bytes
  7. int MIN_TREEIFY_CAPACITY - 4 bytes
  8. int size - 4 bytes
  9. int modcount - 4 bytes
  10. int threshold - 4 bytes
  11. float DEFAULT_LOAD_FACTOR - 4 bytes
  12. float loadFactor - 4 bytes
  13. Node<K, V>[] table - 8 bytes

Node defines:

  1. Object header - 8 bytes
  2. int hash - 4 bytes + 4 bytes padding
  3. K key - 8 bytes
  4. V value - 8 bytes
  5. Node<K, V> next - 8 bytes

Node<K, V>[] should have a size of 2048, if I remember how HashMap works. So it defines:

  1. Object header - 8 bytes
  2. int length - 4 bytes + 4 bytes padding
  3. Node<K, V> reference * 2048 - 8 bytes * 2048 = 16384 bytes.

So the HashSet should be:

  1. 32 bytes for just HashSet
  2. 64 bytes for just HashMap
  3. 40 bytes per Node<K, V> inside Node<K, V>[] * 1000 nodes = 40000 bytes
  4. 16400 bytes for Node<K, V>[] inside the HashMap

Total: 56496 bytes for the HashSet, without taking into account the String contents


So at least by my calculations, the total space taken should be somewhere around 144496 bytes -- about 141 kilobytes (kibibytes for the pedantic). To be honest, this seems like it's more than a bit on the small side, but it's a start.

I can't get the Instrumentation interface working at the moment, so I can't double-check. But if someone knows what he/she is doing a comment pointing out my mistakes would be welcome.

How much memory does a Hashtable use?

Edit; Oh geez, I'm an idiot, I gave info for HashMap, not HashTable. However, after checking, the implementations are identical for memory purposes.

This is dependent on your VM's internal memory setup (packing of items, 32 bit or 64 bit pointers, and word alignment/size) and is not specified by java.

Basic info on estimating memory use can be found here.

You can estimate it like so:

  • On 32-bit VMs, a pointer is 4 bytes, on 64-bit VMs, it is 8 bytes.
  • Object overhead is 8 bytes of memory (for an empty object, containing nothing)
  • Objects are padded to a size that is a multiple of 8 bytes (ugh).
  • There is a small, constant overhead for each hashmap: one float, 3 ints, plus object overhead.
  • There is an array of slots, some of which will have entries, some of which will be reserved for new ones. The ratio of filled slots to total slots is NO MORE THAN the specified load factor in the constructor.
  • The slot array requires one object overhead, plus one int for size, plus one pointer for every slot, to indicate the object stored.
  • The number of slots is generally 1.3 to 2 times more than the number of stored mappings, at default load factor of 0.75, but may be less than this, depending on hash collisions.
  • Every stored mapping requires an entry object. This requires one object overhead, 3 pointers, plus the stored key and value objects, plus an integer.

So, putting it together (for 32/64 bit Sun HotSpot JVM):
HashMap needs 24 bytes (itself, primtive fields) + 12 bytes (slot array constant) + 4 or 8 bytes per slot + 24/40 bytes per entry + key object size + value object size + padding each object to multiple of 8 bytes

OR, roughly (at most default settings, not guaranteed to be precise):

  • On 32-bit JVM: 36 bytes + 32 bytes/mapping + keys & values
  • On 64-bit JVM: 36 bytes + 56 bytes/mapping + keys & values

Note: this needs more checking, it might need 12 bytes for object overhead on 64-bit VM.
I'm not sure about nulls -- pointers for nulls may be compressed somehow.

Find the memory size of a set of strings vs. set of bytestrings

sys.getsizeof does not measure the size of the full target data structure. It only measure the memory taken by the set object which contains references to strings/bytes objects. The references are not included in the returned memory consumption (ie. it does not walk recursively in each object of the target data structure). A reference takes typically 8 bytes on a 64-bit platform and a CPython set is not as compact as a list: it is implemented like a hash-table with many buckets and some buckets are unused. In fact, this is mandatory for this data structure to be fast (in general, the occupancy should be 50%-90%). Moreover, each bucket contains a hash which usually takes 8 bytes.

The string themselves take much more space than a bucket (at least on my machine):

sys.getsizeof(randomstring(50))           # 99
sys.getsizeof(randomstring(50).encode()) # 83

On my machine, it turns out that CPython strings are 16 bytes bigger than bytes.

How big is an object reference?

A object or array reference occupies one 32 bit word (4 bytes) on a 32 bit JVM or Davlik VM. A null takes the same space as a reference. (It has to, because a null has to fit in a reference-typed slot; i.e. instance field, local variable, etc.)

On the other hand, an object occupies a minimum of 2 32 bit words (8 bytes), and an array occupies a minimum of 3 32 bit words (12 bytes). The actual size depends on the number and kinds of fields for an object, and on the number and kind of elements for an array.


For a 64 bit JVM, the size of a reference is 64 bits, unless you have configured the JVM to use compressed pointers:

-XX:+UseCompressedOops Enables the use of compressed pointers (object references represented as 32 bit offsets instead of 64-bit pointers) for optimized 64-bit performance with Java heap sizes less than 32gb.


This is the nub of your question, I think.

Before determining the size of the hash table, I wanted to know how much memory would it consume in order not to exagerate.

If you allocate a HashMap or Hashtable with a large initial size, the majority of the space will be occupied by the hash array. This is an array of references, so the size will be 3 + initialSize 32 bit words. It is unlikely that this will be significant ... unless you get your size estimate drastically wrong.

However, I think you are probably worrying unnecessarily about performance. If you are storing objects in a default allocated HashMap or Hashtable, the class will automatically resize the hash table as it gets larger. So, provided that your objects have a decent hash function (not too slow, not hashing everything to a small number of values) the hash table should not be a direct CPU performance concern.

How much memory Java HashSetLong should take

The size of objects is an implementation detail. There is no guarantee that if it's x bytes on one platform, on another it's also x bytes.

Long is boxed as you know, but 16 bytes is wrong. The primitive long takes 8 bytes but the size of the box around the long is implementation dependent. According to this Hotspot related answer overhead words and padding means a boxed 4-byte int can come to 24 bytes!

The byte alignment and padding mentioned in that (Hotspot specific) answer also would apply to the Entry objects which would also push the consumption up.

In-memory size of a Python structure

The recommendation from an earlier question on this was to use sys.getsizeof(), quoting:

>>> import sys
>>> x = 2
>>> sys.getsizeof(x)
14
>>> sys.getsizeof(sys.getsizeof)
32
>>> sys.getsizeof('this')
38
>>> sys.getsizeof('this also')
48

You could take this approach:

>>> import sys
>>> import decimal
>>>
>>> d = {
... "int": 0,
... "float": 0.0,
... "dict": dict(),
... "set": set(),
... "tuple": tuple(),
... "list": list(),
... "str": "a",
... "unicode": u"a",
... "decimal": decimal.Decimal(0),
... "object": object(),
... }
>>> for k, v in sorted(d.iteritems()):
... print k, sys.getsizeof(v)
...
decimal 40
dict 140
float 16
int 12
list 36
object 8
set 116
str 25
tuple 28
unicode 28

2012-09-30

python 2.7 (linux, 32-bit):

decimal 36
dict 136
float 16
int 12
list 32
object 8
set 112
str 22
tuple 24
unicode 32

python 3.3 (linux, 32-bit)

decimal 52
dict 144
float 16
int 14
list 32
object 8
set 112
str 26
tuple 24
unicode 26

2016-08-01

OSX, Python 2.7.10 (default, Oct 23 2015, 19:19:21) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin

decimal 80
dict 280
float 24
int 24
list 72
object 16
set 232
str 38
tuple 56
unicode 52


Related Topics



Leave a reply



Submit