Efficient Way to Insert a Number into a Sorted Array of Numbers

Efficient way to insert a number into a sorted array of numbers?

Just as a single data point, for kicks I tested this out inserting 1000 random elements into an array of 100,000 pre-sorted numbers using the two methods using Chrome on Windows 7:

First Method:
~54 milliseconds
Second Method:
~57 seconds

So, at least on this setup, the native method doesn't make up for it. This is true even for small data sets, inserting 100 elements into an array of 1000:

First Method:
1 milliseconds
Second Method:
34 milliseconds

How to immutably insert into a sorted array?

Try this one.

let sortedArr = [1,2,5,9,12];
const newItem = 7;
for (let i = 0; i < sortedArr.length; i++) { if (newItem <= sortedArr[i]) { sortedArr = [...sortedArr.slice(0, i), newItem, ...sortedArr.slice(i)]; break; }}console.log(sortedArr);

Is it possible to insert a number into a sorted array, time: log(n)

"is it possible to do so in log(n)? " - in short no. From my recollections and experience inserting into an arbitrary position in an array is always O(N), almost by definition. If you want faster performance use something like the TreeMap class.

Most efficient way to insert an element into sorted array and find its index

You can use Boost.MultiIndex's ranked indices:

Live Coliru Demo

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ranked_index.hpp>
#include <boost/multi_index/identity.hpp>

using namespace boost::multi_index;
using ranked_int_set=multi_index_container<
int,
indexed_by<
ranked_unique<identity<int>>
>
>;

#include <iostream>

int main()
{
ranked_int_set s={0,2,4,6,8,10,12,14,16};

auto it=s.insert(9).first;
std::cout<<"9 was inserted at position #"<<s.rank(it)<<"\n";
std::cout<<"14 is located at position #"<<s.find_rank(14)<<"\n";
}

Output


9 was inserted at position #5
14 is located at position #8

Inserting an integer into a sorted array

Change

if(k < s[i]) {

to

if(k < a[i]) {

in line 4.

Sorting a small array into a large sorted array

You can implement a chunk-based algorithm to solve this problem efficiently (whatever the input size of the arrays as long as one is much smaller than the other).

First of all, you need to sort the small array (possibly using a radix sort or a bitonic sort if you do not need a custom comparator).
Then the idea is to cut the big array in chunks fully fitting in the CPU cache (eg. 256 KiB).
For each chunk, find the index of the last item in the small array <= to the last item of the chunk using a binary search.
This is relatively fast because the small array likely fit in the cache and the same items of the binary search are fetched between consecutive chunks if the array is big.
This index enable you to know how many items need to be merged with the chunks before being written.
For each value to be merged in the chunk, find the index of the value using a binary search in the chunk.
This is fast because the chunk fit in the cache.
Once you know the index of the values to be inserted in the chunk, you can efficiently move the item by block in each chunk (possibly in-place from the end to the beginning).
This implementation is much faster than the traditional merge algorithm since the number of comparison needed is much smaller thanks to the binary search and small number of items to be inserted by chunk.

For relatively big input, you can use a parallel implementation. The idea is to work on a group of multiple chunks at the same time (ie. super-chunks).
Super-chunks are much bigger than classical ones (eg. >=2 MiB).
Each thread work on a super-chunk at a time. A binary search is performed on the small array to know how many values are inserted in each super-chunk.
This number is shared between threads so that each threads know where it can safely write the output independently of other thread (one could use a parallel-scan algorithm to do that on massively parallel architecture). Each super-chunk is then split in classical chunks and the previous algorithm is used to solve the problem in each thread independently.
This method should be more efficient even in sequential when the small input arrays do not fit in the cache since the number of binary search operations in the whole small array will be significantly reduced.

The (amortized) time complexity of the algorithm is O(n (1 + log(m) / c) + m (1 + log(c))) with m the length of the big array, n the length of the small array and c the chunk size (super-chunks are ignored here for sake of clarity, but they only change the complexity by a constant factor like the constant c does).

Alternative method / Optimization: If your comparison operator is cheap and can be vectorized using SIMD instructions, then you can optimize the traditional merge algorithm. The traditional method is quite slow because of branches (that can hardly be predicted in the general case) and also because it cannot be easily/efficiently vectorized. However, because the big array is much bigger than the small array, the traditional algorithm will pick a lot of consecutive value from the big array in between the ones of the small array. This means that you can pick SIMD chunks of the big array and compare the values with one of the small array. If all SIMD items are smaller than the one picked from the small array, then you can write the whole SIMD chunk at once very efficiently. Otherwise, you need to write a part of the SIMD chunk, then write the item of the small array and switch to the next one. This last operation is clearly less efficient but it should happen rarely since the small array is much smaller than the big one. Note that the small array still needs to be sorted first.

Most time-efficient sorting algorithm for array of struct with conditions

The first thing to do is to find the items with age in the range 30..40 and copy them in an optimized growing data structure. This data structure is an array with a capacity growing exponentially every time there is not enough remaining space (ie. std::vector in C++ or for example this in C).

Each item in this new data structure is defined as:

typedef struct {
uint32_t name_prefix;
uint32_t id; /* Assume the number of persons is not bigger than 4 billion */
} person_ref;

where name_prefix contains the first characters of person::name and where id contains the position of the item in the initial unfiltered array. Assuming person::name contains ASCII/ANSI characters, it can be for example (p.name[0] << 24) | (p.name[1] << 16) | (p.name[2] << 8) | p.name[3]. While it may sound expensive, modern mainstream processors can do that in only 1 fast instruction. If there is some additional restriction about the name (eg. upper case printable ASCII characters), you can pack more character in this prefix.

Then you can sort the new data structure using a simple quick-sort (or better: an Introsort). Note that qsort can be used in C and std::sort in C++. The comparison operator can be:

/* For a C++ code, please use references instead of pointers */
bool compare(const person& p1, const person* p2) {
const uint32_t prefix1 = p1.name_prefix;
const uint32_t prefix2 = p2.name_prefix;
return prefix1 < prefix2 ||
(prefix1 == prefix2 && strcmp(array[p1.id].name, array[p2.id].name) < 0);
}

where array is the original array with all the unfiltered items.

Finally, you can copy the item back thanks to the person_ref::id field.


This approach works well since the prefixes of the names are assumed often not to be equal. Indeed, comparing integers is much faster than comparing two strings with strcmp. Moreover, working on small items makes sorting algorithm faster because the copy is less expensive and also because the full array can better fit in fast CPU caches. If a lot prefixes are equal and the input data structure is big, then it should be better to use a 64-bit integer prefix, or to copy the filtered items in another data structure in the worst case (so to better use CPU caches).

If the number of filtered items is huge and you want an even faster sort, you can use a linear-time radix-sort combined with an intro-sort so to sort the person sharing the same prefix.



Related Topics



Leave a reply



Submit