Promise.All Consumes All My Ram

Promise.all consumes all my RAM

You will use a lesser amount of memory if you don't ever have 58k promises, their associated async operations and their result data active at once.

Instead you want to run X operations at once and then when one finishes, you start the next one with never more than X in flight at the same time and never more than X promises in use at once.

You can experiment with an appropriate value of X. A value of 1 is sequential operations but you can often improve overall end-to-end operation time by using some higher value of X. If all requests are hitting the same host, then X is probably no more than 5-10 (since a given host can't really do a lots of things at once and asking it to do more than it can do at once just slows it down).

If every request is to a different host, then you may be able to make X higher. Experimentation would give you an optimal value for both peak memory usage and overall throughput and somewhat depends upon your specific circumstances.

Bluebird's Promise.map() has a concurrency option that will do this for you, but there are also numerous ways to code for only X in flight at the same time.

Here are some other coding examples of managing how many are in flight at a time:

Make several requests to an API that can only handle 20 request a minute

How to execute promises in series?

unable to complete promises due to out of memory

Fire off 1,000,000 requests 100 at a time

How to make it so that I can execute say 10 promises at a time in javascript to prevent rate limits on api calls?


If you don't need the resolved data, you can allow it to be GCed sooner by replacing it like this:

  const p = backgroundScheduler.getClanProfile(clanTags[i], true).then(data => {
return 0; // make resolved value just be a simple number
// so other data is now eligible for GC
});
promiseArray.push(p)

And, here's a simple implementation that iterates an array with no more than X requests in flight at the same time:

// takes an array of items and a function that returns a promise
// runs no more than maxConcurrent requests at once
function mapConcurrent(items, maxConcurrent, fn) {
let index = 0;
let inFlightCntr = 0;
let doneCntr = 0;
let results = new Array(items.length);
let stop = false;

return new Promise(function(resolve, reject) {

function runNext() {
let i = index;
++inFlightCntr;
fn(items[index], index++).then(function(val) {
++doneCntr;
--inFlightCntr;
results[i] = val;
run();
}, function(err) {
// set flag so we don't launch any more requests
stop = true;
reject(err);
});
}

function run() {
// launch as many as we're allowed to
while (!stop && inFlightCntr < maxConcurrent && index < items.length) {
runNext();
}
// if all are done, then resolve parent promise with results
if (doneCntr === items.length) {
resolve(results);
}
}

run();
});
}

Are promises and closures consuming all of my memory?

Your code uses the "promise" library which to be fair is very memory hoggy and was not really built for raw performance. If you switch to Bluebird promises you can get considerably more items in RAM as it will drastically reduce your memory usage.

Here are benchmark results for doxbee-sequential:

results for 10000 parallel executions, 1 ms per I/O op

file time(ms) memory(MB)
promises-bluebird.js 280 26.64
promises-then-promise.js 1775 134.73

And under bench parallel (--p 25):

file                                time(ms)  memory(MB)
promises-bluebird.js 483 63.32
promises-then-promise.js 2553 338.36

You can see the full benchmark here.

unable to complete promises due to out of memory

You are trying to run 1000 web scrapes in parallel. You will need to pick some number significantly less than 1000 and run only N at a time so you consume less memory while doing so. You can still use a promise to keep track of when they are all done.

Bluebird's Promise.map() can do that for you by just passing a concurrency value as an option. Or, you could write it yourself.

I have an idea to fix but I don't like it. That is, change control
flow to not use Promise.all, but chain my promises:

What you want is N operations in flight at the same time. Sequencing is a special case where N = 1 which would often be much slower than doing some of them in parallel (perhaps with N = 10).

This is not as good as Promise.all... Not performant as it's chained,
and returned values have to be stored for later processing.

If stored values are part of your memory problem, you may have to store them out of memory somewhere any ways. You will have to analyze how much memory the stored results are using.

Any suggestions? Should I improve the control flow or should I improve
mem usage in scrap(), or is there a way to let node throttle mem
allocation?

Use Bluebird's Promise.map() or write something similar yourself. Writing something that runs up to N operations in parallel and keeps all the results in order is not rocket science, but it is a bit of work to get it right. I've presented it before in another answer, but can't seem to find it right now. I will keep looking.

Found my prior related answer here: Make several requests to an API that can only handle 20 request a minute

Is there a way to optimize this Promise loop so I stop getting FATAL JavaScript heap out of memory errors?

You can either, do every async operation one at a time, or do courses and classes one at a time.

Everything one at a time.

for (let i = 0; i < flatCourses.length; i+=100) {
try {
await Backendless.Data.of(Course).bulkCreate(flatCourses.slice(i, i + 100))
process.stdout.write(".");
} catch (e) {
console.info(e)
}
}
// Do the same for flatClasses

courses and classes one at a time

const promises = []

for (let i = 0; i < flatCourses.length; i+=100) {
const promise = Backendless.Data.of(Course).bulkCreate(flatCourses.slice(i, i + 100))
promise.then(() => {
process.stdout.write(".");
})
.catch((e: Error) => console.info(e));

promises.push(promise)
}

await Promise.all(promises)
// Do the same for flatClasses

More complex approaches involve doing N operations at a time, but I wouldn't go that far if these simple approaches solve your issue.

Running a promise.all of promise.alls

You're making a mistake a lot of people make at first: Promise.all doesn't run anything. It just waits for things that are already running. By the time you've broken your influencerAccounts array into chunks, you've probably already overloaded the server, because you're still sending it 400+ requests at the same time.

Instead, chunk the payout array, and then process it in chunks, something along these lines:

const results = [];
const promise =
_.chunk(payout, 50).reduce(
(p, chunk) =>
p.then(chunkResults => {
results.push(...chunkResults);
return Promise.all(chunk.map(startRequest));
})
,
Promise.resolve([])
)
.then(() => results);

I've used startRequest above instead of createInfluencerWrapper and addInfluencerAccounts because it wasn't clear to me if you'd introduced one or the other in an attempt to make your chunking work. But if not, startRequest is simply addInfluencerAccounts(createInfluencerWrapper(entry)).

That starts a chunk of 50 requests, uses Promise.all to wait for all of them to complete, then starts the next chunk of 50 requests. The "do this then when it's done do that" part comes from the promise reduce idiom, which in its simple form looks like this:

someArray.reduce((p, entry) => p.then(() => doSomethingWith(entry)), Promise.resolve());

It starts with a resolved promise, and hooks a then handler on it to do the next thing, which hooks a then handler on that to do the next thing, etc.


If you don't like closing over results, we can pass it along the reduce chain; here's the first version above doing that:

const promise =
_.chunk(payout, 50).reduce(
({p, results}, chunk) => ({
p: p.then(chunkResults => {
results.push(...chunkResults);
return Promise.all(chunk.map(startRequest));
}),
results
}),
{p: Promise.resolve([]), results: []}
)
.then(({results}) => results);

Awaited but never resolved/rejected promise memory usage

Preface (you probably know this!):

await is syntactic sugar for using promise callbacks. (Really, really, really good sugar.) An async function is a function where the JavaScript engine builds the promise chains and such for you.

Answer:

The relevant thing isn't so much whether the promise is settled, but whether the promise callbacks (and the things they refer to / close over) are retained in memory. While the promise is in memory and unsettled, it has a reference to its callback functions, keeping them in memory. Two things make those references go away:

  1. Settling the promise, or
  2. Releasing all references to the promise, which makes it eligible for GC (probably, more below)

In the normal case, the consumer of a promise hooks up handlers to the promise and then either doesn't keep a reference to it at all, or only keeps a reference to it in a context that the handler functions close over and not elsewhere. (Rather than, for instance, keeping the promise reference in a long-lived object property.)

Assuming the debounce implementation releases its reference to the promise that it's never going to settle, and the consumer of the promise hasn't stored a reference somewhere outside this mutual-reference cycle, then the promise and the handlers registered to it (and anything that they hold the only reference for) can all be garbage collected once the reference to the promise is released.

That requires a fair bit of care on the part of the implementation. For instance (thanks Keith for flagging this up), if the promise uses a callback for some other API (for instance, addEventListener) and the callback closes over a reference to the promise, since the other API has a reference to the callback, that could prevent all references to the promise from being released, and thus keep anything the promise refers to (such as its callbacks) in memory.

So it'll depend on the implementation being careful, and a bit on the consumer. It would be possible to write code that would keep references to the promises, and thus cause a memory leak, but in the normal case I wouldn't expect the consumer to do that.

Is there a limit to how many promises can or should run concurrently?

Promises themselves have no particular coded limits. They are just a notification system and you could have millions of them just fine (as long as you had enough memory to hold those Javascript objects).

Now, if a promise represents an underlying asynchronous operation (which they usually do), there could very well be some limits to how many of that specific type of asynchronous operation can be in flight at the same time. For example, at some point you might run into limits of how many requests a single host would accept from you at the same time. Or, you might run into local resources issues with zillions of connections somewhere.

For things like node.js disk I/O operations, the underlying disk I/O sub-system already has a queuing system so that only a small number of operations are actually running at once and the rest are queued.

So, to answer a question about how many concurrent operations you can have, it can only be analyzed and answered in the context of a specific type of asynchronous request and sometimes even a specific type of receiving host.

If you know you're processing a large or potentially large array of requests and you'll be sending a network request for every item in the array, then it is common to code a limit yourself to avoid overwhelming either local resources or the target host resources. This is usually not done with a queue, but rather code that just launches N requests and then as one finishes, it launches the next one and so on. Both the Bluebird and Async libraries have methods for managing this for you. In Bluebird, it's the concurrency option for Promise.map(). I've also hand-coded loops that manage the number of concurrent connections several times myself and here are links to some of that code:

Promise.all consumes all my RAM

Javascript - how to control how many promises access network in parallel

Make several requests to an API that can only handle 20 request a minute

Loop through an api get request with variable URL

Choose proper async method for batch processing for max requests/sec

Nodejs: Async request with a list of URL

I have groups of promises, how do I resolve each group sequentially ?

For this to work, you need to return a function that returns a promise when called. The function (a thunk) delays the execution of the actual action.

After chunking the array, call the functions in the current chunk, and use Promise.all() to wait for all the promises to resolve:

(async() => {
const pendingPosts = _.range(0, 100).map((item, index) => {
return () => { // the thunk
console.log(`${index}: starting`);

// a simulation of the action - an api call for example
return new Promise(resolve => {
setTimeout(() => resolve(), index * 300);
});
}
});

let i = 0;
for (const pendingChunk of _.chunk(pendingPosts, 10)) {
console.log(`***********************
GROUP ${i}
SIZE ${pendingChunk.length}
***********************`);
await Promise.all(pendingChunk.map(p => p())); // invoke the thunk to call the action
i++;
}
}

)().catch((e) => {
throw e;
})
.as-console-wrapper { max-height: 100% !important; top: 0; }
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.21/lodash.min.js" integrity="sha512-WFN04846sdKMIP5LKNphMaWzU7YpMyCU245etK3g/2ARYbPK9Ub18eG+ljU96qKRCWh+quCY7yefSmlkQw1ANQ==" crossorigin="anonymous" referrerpolicy="no-referrer"></script>


Related Topics



Leave a reply



Submit