Ruby- delete a value from sorted (unique) array at O(log n) runtime
The standard Array implementation has no constraint on sorting or duplicate. Therefore, the default implementation has to trade performance with flexibility.
Array#delete
deletes an element in O(n)
. Here's the C implementation. Notice the loop
for (i1 = i2 = 0; i1 < RARRAY_LEN(ary); i1++) {
...
}
The cost is justified by the fact Ruby has to scan all the items matching given value (note delete
deletes all the entries matching a value, not just the first), then shift the next items to compact the array.
delete_at
has the same cost. In fact, it deletes the element by given index, but then it uses memmove
to shift the remaining entries one index less on the array.
Using a binary search will not change the cost. The search will cost you O(log n)
, but you will need to delete the element at given key. In the worst case, when the element is in position [0]
, the cost to shift all the other items in memory by 1 position will be O(n)
.
In all cases, the cost is O(n)
. This is not unexpected. The default array implementation in Ruby uses arrays. And that's because, as said before, there are no specific constraints that could be used to optimize operations. Easy iteration and manipulation of the collection is the priority.
Array, sorted array, list and sorted list: all these data structures are flexible, but you pay the cost in some specific operations.
Back to your question, if you care about performance and your array is sorted and unique, you can definitely take advantage of it. If your primary goal is finding and deleting items from your array, there are better data structures. For instance, you can create a custom class that stores your array internally using a d-heap
where the delete()
costs O(log[d,n])
, same applies if you use a binomial heap.
time complexity of arr.delete() in ruby
Array#delete
runs in O(n). You can click on the "click to toggle source" button next to a function in the Ruby documentation to see its implementation:
VALUE
rb_ary_delete(VALUE ary, VALUE item)
{
VALUE v = item;
long i1, i2;
for (i1 = i2 = 0; i1 < RARRAY_LEN(ary); i1++) {
VALUE e = RARRAY_AREF(ary, i1);
if (rb_equal(e, item)) {
v = e;
continue;
}
if (i1 != i2) {
rb_ary_store(ary, i2, e);
}
i2++;
}
if (RARRAY_LEN(ary) == i2) {
if (rb_block_given_p()) {
return rb_yield(item);
}
return Qnil;
}
ary_resize_smaller(ary, i2);
return v;
}
As you can see, the function iterates over the array only once.
How to remove range from an array
You can do some cool tricks with the Enumerable module:
a = [0, 2, 8, 2, 4, 5]
r = 1..2
a.reject.with_index { |v, i| r.include?(i) } # => [0, 2, 4, 5]
Note that this does not modify the original array, but returns a new one. You can use reject!
if you want to modify the array.
Delete duplicated elements in an array that's a value in a hash and its corresponding ids
The first idea I had is to zip the values and call uniq, then think a way to return back to the initial form:
h['options'].zip(h['option_ids']).uniq(&:first).transpose
#=> [["Language", "Question", "Answer"], ["12345", "23456", "45678"]]
Then, via parallel assignment:
h['options'], h['option_ids'] = h['options'].zip(h['option_ids']).uniq(&:first).transpose
h #=> {"id"=>"sjfdkjfd", "name"=>"Field Name", "type"=>"field", "options"=>["Language", "Question", "Answer"], "option_ids"=>["12345", "23456", "45678"]}
These are the steps:
h['options'].zip(h['option_ids'])
#=> [["Language", "12345"], ["Question", "23456"], ["Question", "34567"], ["Answer", "45678"], ["Answer", "56789"]]
h['options'].zip(h['option_ids']).uniq(&:first)
#=> [["Language", "12345"], ["Question", "23456"], ["Answer", "45678"]]
Which algorithm does Ruby's sort method use?
Look here: http://www.igvita.com/2009/03/26/ruby-algorithms-sorting-trie-heaps/
It does natively use quicksort however, which is n log n complexity on average.
Get number of elements in a sorted array that fall within a certain range in log(n) time
When you're asked to do something in log(n)
time, this usually means O(log(n))
.
If that's the case here, it is worth noting that O(2 log(n)) == O(log(n))
, i.e. the two are the same thing.
For further background on the big-oh notation, see the Wikipedia page.
Get index of array element faster than O(n)
Convert the array into a hash. Then look for the key.
array = ['a', 'b', 'c']
hash = Hash[array.map.with_index.to_a] # => {"a"=>0, "b"=>1, "c"=>2}
hash['b'] # => 1
Array remove duplicate elements
Check every element against every other element
The naive solution is to check every element against every other element. This is wasteful and yields an O(n2) solution, even if you only go "forward".
Sort then remove duplicates
A better solution is sort the array and then check each element to the one next to it to find duplicates. Choose an efficient sort and this is O(n log n).
The disadvantage with the sort-based solution is order is not maintained. An extra step can take care of this however. Put all entries (in the unique sorted array) into a hashtable, which has O(1) access. Then iterate over the original array. For each element, check if it is in the hash table. If it is, add it to the result and delete it from the hash table. You will end up with a resultant array that has the order of the original with each element being in the same position as its first occurrence.
Linear sorts of integers
If you're dealing with integers of some fixed range you can do even better by using a radix sort. If you assume the numbers are all in the range of 0 to 1,000,000 for example, you can allocate a bit vector of some 1,000,001. For each element in the original array, you set the corresponding bit based on its value (eg a value of 13 results in setting the 14th bit). Then traverse the original array, check if it is in the bit vector. If it is, add it to the result array and clear that bit from the bit vector. This is O(n) and trades space for time.
Hash table solution
Which leads us to the best solution of all: the sort is actually a distraction, though useful. Create a hashtable with O(1) access. Traverse the original list. If it is not in the hashtable already, add it to the result array and add it to the hash table. If it is in the hash table, ignore it.
This is by far the best solution. So why the rest? Because problems like this are about adapting knowledge you have (or should have) to problems and refining them based on the assumptions you make into a solution. Evolving a solution and understanding the thinking behind it is far more useful than regurgitating a solution.
Also, hash tables are not always available. Take an embedded system or something where space is VERY limited. You can implement an quick sort in a handful of opcodes, far fewer than any hash table could be.
What are the Big-Oh complexities of these tasks?
(v) Finding the median (The value of a numerical set that equally divides
the number of values that are larger and smaller) of an array of sorted
items(v) Again i am not sure on this, maybe O(log n) since the array is sorted so you only need to search half the values, essentially a binary chop.
This one is O(1). You are interested in the item that is in the middle of the sorted array (for N odd), or the average of the two "closest" to the middle (for N even).
Since the data is ordered, you can simply examine the one or two elements needed for the result.
Ruby#index Method VS Binary Search
The built-in #index
is not a binary search, it's just a simple iterative search. However, it is implemented in C rather than Ruby, so naturally it can be several orders of magnitude faster.
Related Topics
Rubocop, How to Disable/Enable Cops on Blocks of Code
Open-Uri Is Not Redirecing Http to Https
P Method in Ruby Hard to Search For
Sort by Date in Descending Order in Ruby in Rails
How to Link to a Page with Page.Url Without the HTML Extension in Jekyll
Where Are Catch and Throw Useful in Ruby
Regex for Matching All Words Between a Set of Curly Braces
Kernel_Require.Rb:55:In 'Require': Cannot Load Such File Error
How to Spec Methods That Exit or Abort
Change Emacs Ruby-Mode Indent to 4 Spaces
Handling Has_One Nested Resource in Rails 3
Can Someone Explain the Following Code to Me
What Is the Directory Structure for Adding Sorbet Rbis to a Gem
How to Display a Regexp Output in an Alphabetical List
While Executing Gem ... (Argumenterror) Unknown Encoding Name - Cp720