Need Help Maximizing 3 Factors in Multiple, Similar Objects and Ordering Appropriately

Need help maximizing 3 factors in multiple, similar objects and ordering appropriately

What about having variable weights, and letting the user adjust it through some input like levers, so that the sort order will be dynamically updated?

Calculate 'Ranking' based on 'weights' - what's the formula , given different ranges of values

A more general term for your problem would be 'Multi-Criteria Decision Analysis' which is a well studied subject and you will be able to find different model for different use cases.

Let's take a simple model for your case, where we will create a score based on weights and calculate it for each car:

import pandas as pd

data = pd.DataFrame({
    'CAR': ['A','B','C','D'],
    'SPEED': [135, 150, 170, 120],
    'MPG': [20, 15, 18, 30],
    'COST': [50000, 60000, 80000, 4000] 
})

def Score(df):
    return 0.5*df['SPEED'] + 0.3*df['MPG'] + 0.2*df['COST']

data['SCORE'] = data.apply(lambda x: Score(x), axis=1)

data = data.sort_values(by=['SCORE'], ascending=False)

print(data)

This would give us:

  CAR  SPEED  MPG   COST    SCORE
2   C    170   18  80000  16090.4
1   B    150   15  60000  12079.5
0   A    135   20  50000  10073.5
3   D    120   30  40000   8069.0

As you can see in the function "SCORE" we are simply multiplying the value by weight and summing them to get a new value based on which we list the items.

The important consideration here is whether you are happy with the formula we used in Score or not. You can change it however you want and for whatever purpose you are building your model.

Efficient implementation of multi-factor weighted sorting in Java

You can maintain two TreeSet for storing age and income information separately, so you can easily query from those two trees the rank of age and income when sorting.

We can call tailSet(int) method from TreeSet to get the list of numbers greater than or equals a specific number, and in this case, it will be the rank of age/income.

TreeSet ageRank = new TreeSet();
TreeSet incomeRank = new TreeSet();

for(Person p : persons){
   ageRank.add(p.getAge());
   incomeRank.add(p.getIncome());
}

Collections.sort(persons, new Comparator<Person>(){
        @Override
        public int compare(Person p1, Person p2) {              
             int ageRank1 = ageRank.tailSet(p1.getAge()).size();
             int ageRank2 = ageRank.tailSet(p2.getAge()).size();
             int incomeRank1 = incomeRank.tailSet(p1.getIncome()).size();
             int incomeRank2 = incomeRank.tailSet(p2.getIncome()).size();
             //Calculate the combined_rank and return result here. Code omitted  

        }
});

With this approach, one sorting one for loop will be enough for all calculation.

This approach will come in handy if you need to update the list of person regularly, as you don't need to sort age and income and recalculate all the rank again and again when there is an update, you just need to update these two trees.

Notice: in order to use ageRank and incomeRank inside the inner Comparator class used for sorting, they need to be declared as final or as instance variables.

ActiveRecord: load a corresponding array of records from an array of primary keys (preserve order, duplicates, maximize performance)

Let's start with the most obvious approach first:

type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3] 
type_a_tasks = type_a_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
type_b_tasks = type_b_task_ids.map { |task_id| Task.includes(:project).find(task_id) }

The above is simple, readable but potentially slow: it will perform one database round-trip for each distinct task_id as well as one database round-trip for each distinct project_id in the given tasks. All the latency adds up, so you want to load the tasks (and corresponding projects) in bulk.

It would be great if you could have Rails bulk-load (prefetch) and cache those same records upfront in, say, two round-trips (one for all distinct tasks and one for all distinct associated projects), and then just have the exact same code as above -- except find would always hit the cache instead of the database.

Unfortunately things don't quite work that way (by default) in Rails, as ActiveRecord uses a query cache. Running Task.find(1) (SELECT * FROM tasks WHERE id=1) after Task.find([1,2,3]) (SELECT * FROM tasks WHERE id IN (1,2,3)) will not leverage the query cache since the first query is different from the second. (Running Task.find(1) a second, third etc. time will leverage the query cache, though, as Rails will see the exact same SELECT query fly by multiple times and return the cached result sets.)

Enter IdentityMap caching. Identity Map Caching is different in the sense that it caches records, not queries, on a per-table-and-primary-key basis. Thus, running Task.find([1,2,3]) would fill out three records in the Identity Map Cache for table tasks (the entries with IDs 1, 2 and 3 respectively), and a subsequent Task.find(1) would promptly return the cached record for table tasks and ID 1.

# with IdentityMap turned on (see IdentityMap documentation)
# prefetch all distinct tasks and their associated projects
# throw away the result, we only want to prep the cache
Task.includes(:project).find(type_a_task_ids & type_b_task_ids)
# proceed with regular logic
type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3] 
type_a_tasks = type_a_task_ids.map { |task_id| Task.includes(:project).find(task_id) }
type_b_tasks = type_b_task_ids.map { |task_id| Task.includes(:project).find(task_id) }

However, IdentityMap has never been active by default (for good reason), and was ultimately removed from Rails.

How do you achieve the same result without IdentityMap? Simple:

# prefetch all distinct tasks and their associated projects
# store the result in our own identity cache
my_tasks_identity_map = \
  Hash[Task.includes(:project).find(type_a_task_ids & type_b_task_ids).map { |task|
    [ task.id, task ]
  }]
# proceed with cache-centric logic
type_a_task_ids = [1,2,3,1,2,3]
type_b_task_ids = [1,2,2,3,3] 
type_a_tasks = type_a_task_ids.map { |task_id| my_tasks_identity_map[task_id] }
type_b_tasks = type_b_task_ids.map { |task_id| my_tasks_identity_map[task_id] }

Not getting output for Merge Sort

You forgot about index incrementing in tail treatment in Merging.

ideone

while(i1<=j1){
    b[k++] = arr[i1++];
}
while(i2<=j2){
    b[k++] = arr[i2++];

How do I get my data to render correctly?

The app is working as intended, the reason that there's nothing in the plots it's because the data has some NA's in the sunshine col and the choices for the inputs are based in the levels of the variables but not the unique values from it. All that combined will yield choices in the selectInput's that will not be present in the plot. Instead try doing:

weather_data <- unique(weather_files$Town)
Weather_years <- unique(weather_files$year)
Weather_month <- unique(weather_files$month_year)

This will narrow the choices of the inputs to data that is observed in the data set.

Optionally we can create the reactive select_weatherdf like this:

  select_weatherdf <- eventReactive(input$go, {
   weather_files %>% filter(Town %in% input$Town, year %in% input$year) %>% 
     select("Town", "year", "month_year", "Sunshine")
    #no need for a return statement here R will return the last value called from the function
  })

What is a better way to sort by a 5 star rating?

Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:

The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:

R = average for the movie (mean)

v = number of votes for the movie

m = minimum votes required to be listed in the Top 250 (currently 25000)

C = the mean vote across the whole report (currently 7.0)

For the Top 250, only votes from regular voters are considered.

It's not so hard to understand. The formula is:

rating = (v / (v + m)) * R +
         (m / (v + m)) * C;

Which can be mathematically simplified to:

rating = (R * v + C * m) / (v + m);

The variables are:

R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.

All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.

In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.

When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.

Maximize the number of Elements in the Array divisible by M

This task boils down to a well-known Dynamic programming algorithm called Knapsack problem after a couple of simple manipulations with the given array.

This approach doesn't require sorting and would be advantages when k is much smaller n.

We can address the problem in the following steps:

Iterate over the given array and count all the numbers that are already divisible by m (this number is stored in the variable count in the code below).
While iterating, for every element of the array calculate the difference between m and remainder from the division of this element by m. Which would be equal to m - currentElement % m. If the difference is smaller or equal to k (it can cave this difference) it should be added to the list (differences in the code below) and also accumulated in a variable which is meant to store the total difference (totalDiff). All the elements which produce difference that exceeds k would be omitted.
If the total difference is less than or equal to k - we are done, the return value would be equal to the number of elements divisible by m plus the size of the list of differences.
Otherwise, we need to apply the logic of the Knapsack problem to the list of differences.

The idea behind the method getBestCount() (which is an implementation Knapsack problem) boils down to generating the "2D" array (a nested array of length equal to the size of the list of differences +1, in which every inner array having the length of k+1) and populating it with maximum values that could be achieved for various states of the Knapsack.

Each element of this array would represent the maximum total number of elements which can be adjusted to make them divisible by m for the various sizes of the Knapsack, i.e. number of items available from the list of differences, and different number of k (in the range from 0 to k inclusive).

The best way to understand how the algorithm works is to draw a table on a piece of paper and fill it with numbers manually (follow the comments in the code, some intermediate variables were introduced only for the purpose of making it easier to grasp, and also see the Wiki article linked above).

For instance, if the given array is [1, 8, 3, 9, 5], k=3 and m=3. We can see 2 elements divisible by m - 3 and 9. Numbers 1, 8, 5 would give the following list of differences [2, 1, 1]. Applying the logic of the Knapsack algorithm, we should get the following table:

[0, 0, 0, 0]
[0, 0, 1, 1]
[0, 1, 1, 2]
[0, 1, 2, 2]

We are interested in the value right most column of the last row, which is 2 plus 2 (number of elements divisible by 3) would give us 4.

^{Note: that code provided below can dial only with positive numbers. I don't want to shift the focus from the algorithm to such minor details. If OP or reader of the post are interested in making the code capable to work with negative number as well, I'm living the task of adjusting the code for them as an exercise. Hint: only a small change in the countMultiplesOfM() required for that purpose.}

That how it might be implemented:

public static int countMultiplesOfM(int[] arr, int k, int m) {
    
    List<Integer> differences = new ArrayList<>();
    int count = 0;
    long totalDiff = 0; // counter for the early kill - case when `k >= totalDiff`

    for (int next : arr) {

        if (next % m == 0)
            count++; // number is already divisible by `m` we can increment the count and from that moment we are no longer interested in it
        
        else if (m - next % m <= k) {
            differences.add(m - next % m);
            totalDiff += m - next % m;
        }
    }
    
    if (totalDiff <= k) { // early kill - `k` is large enough to adjust all numbers in the `differences` list
        return count + differences.size();
    }
    return count + getBestCount(differences, k); // fire the rest logic
}

// Knapsack Algorithm implementation
public static int getBestCount(List<Integer> differences, int knapsackSize) {
    
    int[][] tab = new int[differences.size() + 1][knapsackSize + 1];
    for (int numItemAvailable = 1; numItemAvailable < tab.length; numItemAvailable++) {
        int next = differences.get(numItemAvailable - 1);  // next available item which we're trying to place to knapsack to Maximize the current total

        for (int size = 1; size < tab[numItemAvailable].length; size++) {
            int prevColMax = tab[numItemAvailable][size - 1]; // maximum result for the current size - 1 in the current row of the table
            int prevRowMax = tab[numItemAvailable - 1][size]; // previous maximum result for the current knapsack's size

            if (next <= size) { // if it's possible to fit the next item in the knapsack
                int prevRowMaxWithRoomForNewItem = tab[numItemAvailable - 1][size - next] + 1; // maximum result from the previous row for the size = `current size - next` (i.e. the closest knapsack size which guarantees that there would be a space for the new item)
                tab[numItemAvailable][size] = Math.max(prevColMax, prevRowMaxWithRoomForNewItem);
            } else {
                tab[numItemAvailable][size] = Math.max(prevRowMax, prevColMax); // either a value in the previous row or a value in the previous column of the current row
            }
        }
    }
    return tab[differences.size()][knapsackSize];
}

main()

public static void main(String[] args) {
    System.out.println(countMultiplesOfM(new int[]{17, 8, 9, 1, 4}, 3, 4));
    System.out.println(countMultiplesOfM(new int[]{1, 2, 3, 4, 5}, 2, 2));
    System.out.println(countMultiplesOfM(new int[]{1, 8, 3, 9, 5}, 3, 3));
}

Output:

3   // input array [17, 8, 9, 1, 4], m = 4, k = 3
4   // input array [1, 2, 3, 4, 5],  m = 2, k = 2
4   // input array [1, 8, 3, 9, 5],  m = 3, k = 3

A link to Online Demo

Need Help Maximizing 3 Factors in Multiple, Similar Objects and Ordering Appropriately