Memory Errors and List Limits

Memory errors and list limits?

First off, see How Big can a Python Array Get? and Numpy, problem with long arrays

Second, the only real limit comes from the amount of memory you have and how your system stores memory references. There is no per-list limit, so Python will go until it runs out of memory. Two possibilities:

  1. If you are running on an older OS or one that forces processes to use a limited amount of memory, you may need to increase the amount of memory the Python process has access to.
  2. Break the list apart using chunking. For example, do the first 1000 elements of the list, pickle and save them to disk, and then do the next 1000. To work with them, unpickle one chunk at a time so that you don't run out of memory. This is essentially the same technique that databases use to work with more data than will fit in RAM.

Python list memory error

Yes, there is a limit to how many elements a Python list can hold, see sys.maxsize. You did not hit it however; very few machines have enough memory to hold that many items.

You are trying to create a list with 1073741824 references; each reference takes memory too. It depends on your OS how much, but typically that is going to be 4 bytes for a 32-bit system, 8 bytes for a 64-bit OS, where 2^30 elements would take 4 GB or 8GB of memory, just for the list references.

4GB plus other elements is already easily more than what most current operating systems will permit a single process to use in memory.

On my Mac OS X machine (using 64-bit OS), sys.maxsize is 2^63, and Python object references in a list take 8 bytes:

>>> import sys
>>> sys.maxsize
9223372036854775807
>>> sys.maxsize.bit_length()
63
>>> sys.getsizeof([]) # empty list overhead
72
>>> sys.getsizeof([None]) - sys.getsizeof([]) # size of one reference
8

So to create a list with sys.maxsize elements you'd need 64 exbibytes of memory, just for the references. That is more than what a 64-bit computer could address (the practical maximum is about 16 exbibytes).

All this is ignoring the memory footprint that the objects you are referencing in the list will take. None is a singleton, so it'll only ever take a fixed amount of memory. But presumably you were going to store something else in that list, in which case you need to take that into account too.

And generally speaking you should never need to create such a large list. Use different techniques; create a sparse structure using a dictionary for example, I doubt you were planning to address all those 2^30 indexes directly in your algorithm.

Python Numpy - Matrix memory error and limits

Based on the specific error message:

File
"C:/Users/KP/Desktop/FSC_Treetag/out/f3_test_from_files_to_matrix_fonctions.py",
line 6, in matrix = np.zeros(shape=(5037,15999)) MemoryError

You don't have enough memory to allocate the array. Depending on your system, each value in your matrix will be using something like 8 bytes, so this array should only occuryp about 600 MB of memory... which really isn't much. Likely there are other things (processes, open files, etc) that are eating up all of your system memory.

At the same time, since you are just finding whether each word exists or not in the file, you only need a single bit for each file-word entry in the matrix. In that case you should simply use a bitarray (i.e. a single bit for each entry).

Axios GET in for loop causing Sort exceeded memory limit error

The syntax itself is fine, I am not quite sure why it's not working for you.

I have two theory's that could be possible:

  1. You are hosting your db on Atlas and are using unqualified instances for this operations as specified in their docs:

Atlas M0 free clusters and M2/M5 shared clusters don't support the allowDiskUse


  1. It seems you're using mongoose, maybe you are using an older version that has an issue with this cursor flag or has different syntax.

Regardless of the source of the issue, I recommend you solve this by either:

  1. Create an index on createdAt, sorting when an index exists does not scan documents into memory, which will make both the query more efficient and solve this issue.

  2. Just use the _id index and sort with { _id: -1}, from your comment about removing the sort operation and getting documents in reverse order, it seems that your createdAt corresponds document creation date ( makes sense ), the ObjectId _id is monotonically increasing which means you can just use that field to sort.



Related Topics



Leave a reply



Submit