What Are the Benefits of a Binary Heap with Root at Arr[0]

What are the benefits of a binary heap with root at arr[0]

I am the person who answered the question you linked.

Creating a binary heap that has the root at arr[1] in a language that has 0-based arrays is idiotic. Not because it wastes a trivial amount of space, but because it creates unnecessarily confusing code for no benefit.

Why is the code confusing? Because it breaks a fundamental rule that we as programmers have been working under for years: arrays start at 0. If you want an array that holds 100 items, you declare it that way:

int a[100];

Except for a binary heap. Because some idiot who converted the original binary heap code from Algol (whose arrays are 1-based) to C (0-based arrays) back in 1973 didn't have the brains to change the child and parent calculations, we've ended up with this one special case where to hold 100 items you have to allocate 101:

int a[101];

And when somebody called that person on the inconsistency, he came up with a specious performance argument.

Yes, there is an extra increment instruction in the code for computing the left child index, and an extra decrement instruction when computing a child's parent index. In the wider context of what a binary heap does, those two instructions will make no practical difference to the running time of any program that uses the heap. None. If the difference is even measurable, it will definitely be noisy. There are many other things happening on your computer that will have much larger effects on your program's performance.

If you're writing a program that requires a high performance priority queue, what the heck are you doing with a binary heap in the first place? If you're really going to store huge numbers of things in your priority queue, you probably should be using something like a Pairing heap, which will outperform binary heap, although at a higher memory cost.

The C++ STL priority_queue, the Java PriorityQueue, and python's heapq all use 0-based binary heaps. The people who wrote those packages understand performance considerations. If there was a significant performance gain to going with a 1-based binary heap, they would have done so. That they went with 0-based heaps should tell you that any performance gain from a 1-based heap is illusory.

See http://blog.mischel.com/2016/09/19/but-thats-the-way-weve-always-done-it/ for a more complete discussion.

Why in a heap implemented by array the index 0 is left unused?

There's no reason why a heap implemented in an array has to leave the item at index 0 unused. If you put the root at 0, then the item at array[index] has its children at array[index*2+1] and array[index*2+2]. The node at array[child] has its parent at array[(child-1)/2].

Let's see.

                  root at 0       root at 1
Left child index*2 + 1 index*2
Right child index*2 + 2 index*2 + 1
Parent (index-1)/2 index/2

So having the root at 0 rather than at 1 costs you an extra add to find the left child, and an extra subtraction to find the parent.

For a more general case where it may not be a binary heap, but a 3-heap, 4-heap, etc where there are NUM_CHILDREN children for each node instead of 2 the formulas are:

                  root at 0                  root at 1
Left child index*NUM_CHILDREN + 1 index*NUM_CHILDREN
Right child index* NUM_CHILDREN + 2 index*NUM_CHILDREN + 1
Parent (index-1)/NUM_CHILDREN index/NUM_CHILDREN

I can't see those few extra instructions making much of a difference in the run time.

For reasons why I think it's wrong to start at 1 in a language that has 0-based arrays, see https://stackoverflow.com/a/49806133/56778 and my blog post But that's the way we've always done it!

Implementing a 4-heap using an array

To find the parent of any child (other than 1, which has no parent):

parent = int((child + 2) / 4)

To find the first and last child of a parent:

child_first = parent * 4 - 2
child_last = parent * 4 + 1

You can see this in operation since, at each level, you add four times as many elements as you did in the previous level:

  1           (   1)
2 thru 5 ( 4)
6 thru 21 ( 16)
22 thru 85 ( 64)
86 thru 341 ( 256)
342 thru 1365 (1024)

Level 1:
1 -> 2 3 4 5

Level 2:
2 -> 6 7 8 9
3 -> 10 11 12 13
4 -> 14 15 16 17
5 -> 18 19 20 21

Level 3:
6 -> 22 23 24 25
7 -> 26 27 28 29
8 -> 30 31 32 33
9 -> 34 35 36 37
10 -> 38 39 40 41
11 -> 42 43 44 45
12 -> 46 47 48 49
13 -> 50 51 52 53
14 -> 54 55 56 57
15 -> 58 59 60 61
16 -> 62 63 64 65
17 -> 66 67 68 69
18 -> 70 71 72 73
19 -> 74 75 76 77
20 -> 78 79 80 81
21 -> 82 83 84 85

 

Level 4:
22 -> 86 87 88 89
23 -> 90 91 92 93
24 -> 94 95 96 97
25 -> 98 99 100 101
: : : :
82 -> 326 327 328 329
83 -> 330 331 332 333
84 -> 334 335 336 337
85 -> 338 339 340 341

Examples are:

parent of 342 = int(344/4) = 86 (same for 343,344,345).
parent of 346 = int(348/4) = 87 (same for 347,348,349).
first child of 21 = 21 * 4 - 2 = 82
last child of 21 = 21 * 4 + 1 = 85

What are the boundaries on increasing keys in minimum heap

The increase of a node's value may not lead to a situation where that value becomes greater than one of its children's values. This is never an issue for the left child, since its value gets increased by the same amount, but it will become an issue for the right child, as that value does not change.

Among the nodes that are on the left most path (except the last one, which does not have a right child), you'll have to find the minimum difference between their right child's value, and their own value. That will be the maximum value that c can get.

In the example heap, these differences are (starting from the top):

7-2 = 5
10-8 = 2
12-9 = 3

The minimum of these differences is 2. So for the example the answer is 2.

Count swaps in Binary Heap

Update after comment

I don't know how you got the output "0 1 2 2 3". I would have expected "0 1 1 2 2". But that's irrelevant now.

I see that I misunderstood your original question. You don't want the number of swaps as each item is inserted. You want to know how many times each item changes position throughout the life of the heap.

You'll need to build a map with the key being the item's value, and the value part being the number of times the item was involved in a swap operation. In your swap method, you check the map to see if the item exists. If the item doesn't exist in the map, add it with a count of 1. If it does exist, increment its count.

When you're done adding items to the heap, you can print the contents of the map.

Original answer

You could have a class variable called numSwaps:

private int numSwaps;

And then in your swap method, you increment it whenever there's a swap:

private void swap(int fpos, int spos) {
int tmp;
tmp = heap[fpos];
heap[fpos] = heap[spos];
heap[spos] = tmp;
numSwaps = numSwaps + 1;
}

And provide methods clearSwaps (to reset to 0) and getSwaps (to return the current number of swaps).



Related Topics



Leave a reply



Submit