Sort an Array in Ruby Ignoring Articles ("The", "A", "An")

Sort an array in Ruby ignoring articles (the, a, an)

My favorite approach to these kind of problems is to store an extra sort_order column in the database.

That way when you have 10000 songs that you would like to page through, you can do that in SQL and avoid having to pull them all back.

Its simple to add a before_save filter to keep this column in sync.

The cleanish solution, without schema changes is:

class Artist
def sortable_name
self.name.sub(/^(the|a|an)\s+/i, '')
end
end

class Song
def sortable_name
# note the - is there so [Radio] [head on] and [Radiohead] [karma police]
# are not mixed in the ordering
"#{artist.sortable_name} - #{name}"
end
end

# breaks ties as well
Song.all.sort_by { |song| song.sortable_name }

Modifying an array item in Ruby if it includes a specific word

I like to break these types of problems up into smaller chunks of logic to help me understand before I write an algorithm. In this case you need to modify each word of the string based on some rules.

  1. If it's the first word, capitalize it.
  2. If it's not a special word, capitalize it.
  3. If it's a special word AND it's not the first word, downcase it.

With these rules you can write your logic to follow.

special_words = ['a', 'an', 'and', 'of', 'the']
fixed_words = []
@string.downcase.split.each_with_index do |word, index|
# If this isn't the first word, and it's special, use downcase
if index > 0 and special_words.include?(word)
fixed_words << word
# It's either the first word, or not special, so capitalize
else
fixed_words << word.capitalize
end
end
fixed_words.join(" ")

You'll notice I'm using downcase on the string before calling split and each_with_index. This is so that all the words get normalized a downcase and can be easily checked against the special_words array.

I'm also storing these transformed words in an array and joining them back together in the end. The reason for that, is if I try to use downcase! or capitalize! on the split strings, I'm not modifying the original title string.

Note: This problem is part of the Bloc Full Stack course work which is why I'm using a simplified solution, rather than one liners, modules, file io, etc.

How to sort an alphanumeric array in ruby

You can pass a block to the sort function to custom sort it. In your case you will have a problem because your numbers aren't zero padded, so this method zero pads the numerical parts, then sorts them, resulting in your desired sort order.

a.sort { |a,b|
ap = a.split('_')
a = ap[0] + "%05d" % ap[1] + "%05d" % ap[2]
bp = b.split('_')
b = bp[0] + "%05d" % bp[1] + "%05d" % bp[2]
b <=> a
}

How to get rid of phantom row in array?

The first row you're seeing is most likely the header row. Header rows use <th> instead of <td>. This means cells = tr.search('td') will be an empty collection for the header row.

In most cases header rows are placed in the <thead> and data rows are placed in <tbody>. So instead of doing tables[0].search('tr') you could be doing tables[0].search('tbody tr'), which only selects rows in the <tbody> tag.

Remove excess junk words from string or array of strings

Dealing with stopwords is easy, but I'd suggest you do it BEFORE you split the string into the component words.

Building a fairly simple regular expression can make short work of the words:

STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i

clean_string = 'to into and sandbar or forest the thesis a algebra'.gsub(STOPWORDS, '')
# => " into sandbar forest thesis algebra"

clean_string.split
# => ["into", "sandbar", "forest", "thesis", "algebra"]

How do you handle them if you get them already split? I'd join(' ') the array to turn it back into a string, then run the above code, which returns the array again.

incoming_array = [
"14000",
"Things",
"to",
"Be",
"Happy",
"About",
]

STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i

incoming_array = incoming_array.join(' ').gsub(STOPWORDS, '').split
# => ["14000", "Things", "Be", "Happy", "About"]

You could try to use Array's set operations, but you'll run afoul of the case sensitivity of the words, forcing you to iterate over the stopwords and the arrays which will run a LOT slower.

Take a look at these two answers for some added tips on how you can build very powerful patterns making it easy to match thousands of strings:

  • "How do I ignore file types in a web crawler?"
  • "Is there an efficient way to perform hundreds of text substitutions in Ruby?"

Custom Sort array of strings by another array of strings - Ruby

I assume that:

  • every element of list is in sort_order;
  • sort_order may contain elements that are not in list;
  • list may contain duplicates; and
  • sort_order contains no duplicates.

If sort_order initially contains duplicates the temporary array sort_order.uniq can be used in the calculations.

Observe that if, as in the example, list contains no duplicates and sort_order contains no elements other than those in list, sorting list by the order of its elements in sort_order is trivial, as it merely returns sort_order.

The following is more efficient than methods that use sort or sort_by (O(n) versus O(n*log(n)) computational complexity.)

list = ["gold", "copper", "silver", "copper", "steel", "gold"]
sort_order = ["bronze", "silver", "tin", "gold", "copper", "steel"]

count = list.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
#=> {"gold"=>2, "copper"=>2, "silver"=>1, "steel"=>1}
sort_order.flat_map { |e| [e]*count[e] }.reject(&:empty?)
#=> ["silver", "gold", "gold", "copper", "copper", "steel"]

Sort in ruby a JSON array of hashes

This should work:

array.sort_by { |hash| hash['id'].to_i }

In this case, sort_by is preferred over sort because it is more efficient. While sort calls to_i on every comparison, sort_by does it once for each element in array and remembers the result.

How to find the kth largest element in an unsorted array of length n in O(n)?

This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, O(n^2) worst case time, and a pretty complicated non-randomized algorithm (called introselect) taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm (introselect):

Select(A,n,i):
Divide input into ⌈n/5⌉ groups of size 5.

/* Partition on median-of-medians */
medians = array of each group’s median.
pivot = Select(medians, ⌈n/5⌉, ⌈n/10⌉)
Left Array L and Right Array G = partition(A, pivot)

/* Find ith element in L, pivot, or G */
k = |L| + 1
If i = k, return pivot
If i < k, return Select(L, k-1, i)
If i > k, return Select(G, n-k, i-k)

It's also very nicely detailed in the Introduction to Algorithms book by Cormen et al.



Related Topics



Leave a reply



Submit