How Can Adding Code to a Loop Make It Faster

What can I do to make this loop run faster?

Unroll the loop 2-8 times. Measure which one is best. The .NET JIT optimizes poorly, so you have to do some of its work.

You'll probably have to add unsafe as well because the JIT will now be unable to optimize out the array bounds checks.

You can also try to aggregate into multiple sum variables:

int sum1 = 0, sum2 = 0;
for (int i = 0; i < array.Length; i+=2) {
sum1 += array[i+0];
sum2 += array[i+1];
}

That might increase instruction-level parallelism because all add instructions are now independent.

The i+0 is optimized to i automatically.


I tested it and it shaved off about 30%.

The timings are stable when repeated. Code:

        Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

var watch = new Stopwatch();

int[] array = new int[500000000];
for (int i = 0; i < array.Length; i++)
{
array[i] = 1;
}

//warmup
{
watch.Restart();
int sum = 0;
for (int i = 0; i < array.Length; i++)
sum += array[i];
}

for (int i2 = 0; i2 < 5; i2++)
{
{
watch.Restart();
int sum = 0;
for (int i = 0; i < array.Length; i++)
sum += array[i];
Console.WriteLine("for loop:" + watch.ElapsedMilliseconds + "ms, result:" + sum);
}

{
watch.Restart();
fixed (int* ptr = array)
{
int sum = 0;
var length = array.Length;
for (int i = 0; i < length; i++)
sum += ptr[i];
Console.WriteLine("for loop:" + watch.ElapsedMilliseconds + "ms, result:" + sum);
}
}

{
watch.Restart();
fixed (int* ptr = array)
{
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
int sum4 = 0;
var length = array.Length;
for (int i = 0; i < length; i += 4)
{
sum1 += ptr[i + 0];
sum2 += ptr[i + 1];
sum3 += ptr[i + 2];
sum4 += ptr[i + 3];
}
Console.WriteLine("for loop:" + watch.ElapsedMilliseconds + "ms, result:" + (sum1 + sum2 + sum3 + sum4));
}
}

Console.WriteLine("===");
}

Further playing around, it turns out that multiple aggregation variables do nothing. Unrolling the loop did a major improvement, though. Unsafe did nothing (except in the unrolled case where it is pretty much required). Unrolling 2 times is as good as 4.

Running this on a Core i7.

How can I speed-up this loop?

First, the most likely reason why your code is running slower than expected is that a becomes a denormalised number. And denormalised numbers are a special case that may run a lot, lot slower. It is also possible that by adding 10^251 and subtracting it again you change a to 0 and dividing zero by anything is faster (since the result doesn't need to be calculated).

But the real speed up comes from not stupidly adding tiny, tiny numbers that have no effect whatsoever. When x = a few hundred, a will be so small that subtracting a/i from b will not make any difference. So instead of b -= a/i; you write

double old_b = b;
b -= a / i;
if (b == old_b) break;

and your time will change from seconds to much less than a millisecond.

Adding a print statement to a loop working in a thread making it run faster

I fixed this by changing the order of time.sleep. The issue was, since message["substep"]= "skip" most of the time, therefore, time.sleep wouldn't execute and the thread doesn't get the chance to sleep. Therefore it would keep running very fast, without allowing other threads to breath some.

class PubSub(QRunnable):
def __init__(self):
QRunnable.__init__(self)
def run(self):
while True:
message["substep"] = "xyz"
message["timestamp"] = str(datetime.now().replace(microsecond=0))
time.sleep(1)
if message["substep"] == "skip":
continue
messageJson = json.dumps(message)
myAWSIoTMQTTClient.publish(self.topic, messageJson, 1)

The reason why print statement would help other threads to run faster previosuly is because I/O operations are slow and therefore would block this current thread for a while and would allow other threads to continue their work.

faster processing of a for loop with put requests python

so concurrent.futures - is the ideal sweetness that is needed to process this. I had to do a little more testing but now a process that used to take 60 to 80 seconds depending on the server i was hitting. Now takes 10 seconds.

    def testing2():
def post_req(payload):
result = session.put(url, verify=verifySSL, headers=header, data=payload)
response = result.json()
logger.debug(response)
return result
start = time.time()
futures = []
with concurrent.futures.ThreadPoolExecutor() as executor:
for k,v in permissions_roleprivs.items():
perm_code = v["permissionCode"]
perm_access = v["access"]
payload = json.dumps(
{"permissionCode": perm_code, "access": perm_access}
)
futures.append(executor.submit(post_req,payload)) #for k,v in permissions_roleprivs.items()
for future in concurrent.futures.as_completed(futures):
future.result()
end = time.time()
logger.debug('intesting 2')
print(f"Time to complete: {round(end - start, 2)}")

One of the key screw ups that i found in my previous attempts at this was

for future in concurrent.futures.as_completed(futures):
future.result()

I didn't have this line of code - properly setup or in my initial tests it didn't exist. When i finally got this working - i was still seeing 60 seconds.

The next problem was it was in the for loop for roleprivs.items() - pulled that out of the initial for loop and was able to process much faster.

What's the fastest way to loop through an array in JavaScript?

After performing this test with most modern browsers:
https://jsben.ch/wY5fo

Currently, the fastest form of loop (and in my opinion the most syntactically obvious).

A standard for-loop with length caching

    var i = 0, len = myArray.length;
while (i < len) {
// your code
i++
}

I would say, this is definitely a case where I applaud JavaScript engine developers. A runtime should be optimized for clarity, not cleverness.



Related Topics



Leave a reply



Submit