Rank Items in an Array Using Python/Numpy, Without Sorting Array Twice

Rank items in an array using Python/NumPy, without sorting array twice

Use advanced indexing on the left-hand side in the last step:

array = numpy.array([4,2,7,1])
temp = array.argsort()
ranks = numpy.empty_like(temp)
ranks[temp] = numpy.arange(len(array))

This avoids sorting twice by inverting the permutation in the last step.

Rank elements of numpy list in descending order

What you want is the ranks your elements would have in descending order.

a = np.array([2,3,1,4,5])
sorted_indices = np.argsort(-a)
ranks = np.empty_like(sorted_indices)
ranks[sorted_indices] = np.arange(len(a))

And result:

>>> ranks
array([3, 2, 4, 1, 0])

Get rankings from numpy array

You can use np.argsort, it gives you the indices of the largest numbers.

indices = np.argsort(values)[::-1]
print(indices)

The [::-1] reverses the list, which is necessary because argsort returns the indices in increasing order. This gives:

[1, 2, 5, 3, 4, 0]

Then you can use

values[indices[n]]

to retrieve the n-th largest value.

Trying to rank two arrays at the same time

You need to concatenate the two arrays CI_SUM_1 and CI_SUM_2 before using argsort such as:

print (np.concatenate([CI_SUM_1,CI_SUM_2]).argsort().argsort())
array([2, 1, 5, 7, 4, 0, 6, 3], dtype=int64)

Rank each number by position in a list of arrays - Python

You can also use magic numpy.argsort.

import numpy as np

x = np.array([[12,7,3],
[4 ,5,6],
[7 ,8,9]])

y = x.shape[0] - np.argsort(np.argsort(x, axis = 0), axis = 0)

Output:

In [111]: y
Out[111]:
array([[1, 2, 3],
[3, 3, 2],
[2, 1, 1]])

Ranking data in python without numpy

IIUC you can build a rank dictionary easily enough, and then loop over the elements of array to find the ranks:

>>> array = [4,2,7,1,1,2]
>>> rankdict = {v: k for k,v in enumerate(sorted(set(array)))}
>>> rankdict
{1: 0, 2: 1, 4: 2, 7: 3}
>>> ranked = [rankdict[a] for a in array]
>>> ranked
[2, 1, 3, 0, 0, 1]

If you want to sort array1 by this ranking, there are several ways to do it. One common one is to build a zipped list and then sort that:

>>> zip(ranked, array)
[(2, 4), (1, 2), (3, 7), (0, 1), (0, 1), (1, 2)]
>>> sorted(zip(ranked, array))
[(0, 1), (0, 1), (1, 2), (1, 2), (2, 4), (3, 7)]
>>> sorted(zip(ranked, array1))
[(0, 23423423), (0, 123423423), (1, 1232), (1, 23423421), (2, 1934), (3, 345453)]

Check if 2 or more items in array are the same and delete one of them according to sorting number - Python

Use df_companies dataframe to group same rows and keep only the best ranking for each group.

Input data:

>>> df
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplA 500 1000 2.0 2.0 2.0 4.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0
3 ApplE 100 1500 4.0 1.0 4.0 1.0

>>> df_companies
Codes Code Shares
0 ApplA Apple A
1 Amazon Amazon Empty
2 Facebook Facebook Empty
3 ApplE Apple E
4 AmazA Amazon A
5 AmazonB Amazon B
out = df.sort_values('Final Sorted Number') \
.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2')) \
.drop_duplicates('Code2') \
.drop(columns=['Code2', 'Codes'])

Output result:

>>> out
Code Volume Trade Volume Order Trade Order Max Ordered Number Final Sorted Number
0 ApplE 100 1500 4.0 1.0 4.0 1.0
1 Amazon 1000 500 1.0 4.0 4.0 2.0
2 Facebook 250 750 3.0 3.0 3.0 3.0

Step by step:

# Reduce number of columns for readability
>>> df = df[['Code', 'Final Sorted Number']]
Code Final Sorted Number
0 ApplA 4.0
1 Amazon 2.0
2 Facebook 3.0
3 ApplE 1.0

# Sort rows by 'Final Sorted Number'
>>> df = df.sort_values('Final Sorted Number')
Code Final Sorted Number
3 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0
0 ApplA 4.0

# Merge dataframes on a common key: 'Code' for left, 'Codes' for right
>>> df = df.merge(df_companies[['Code', 'Codes']], how='left',
left_on='Code', right_on='Codes', suffixes=('', '2'))
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook
3 ApplA 4.0 Apple ApplA

Now, we have 2 new columns from df_companies: Code (renamed into Code2 by suffix) and Codes. Let's continue:

# Keep only the first row for each 'Code2' name
>>> df = df.drop_duplicates('Code2')
Code Final Sorted Number Code2 Codes
0 ApplE 1.0 Apple ApplE
1 Amazon 2.0 Amazon Amazon
2 Facebook 3.0 Facebook Facebook

# Remove added columns from the other dataframe
>>> df = df.drop(columns=['Code2', 'Codes'])
Code Final Sorted Number
0 ApplE 1.0
1 Amazon 2.0
2 Facebook 3.0


Related Topics



Leave a reply



Submit