Pandas DENSE RANK
Use pd.Series.rank
with method='dense'
df['Rank'] = df.Year.rank(method='dense').astype(int)
df
How to get dense rank in each partition window in pandas
This is built-in with groupby
:
df['dense_rank'] = (df.groupby('Dominant_Topic')['appearance']
.rank(method='dense', ascending=False)
.astype(int)
)
Output:
Dominant_Topic word appearance dense_rank
0 Topic 0 aaaawww 50 3
1 Topic 0 aacn 100 2
2 Topic 0 aaren 20 4
3 Topic 0 aarongoodwin 200 1
4 Topic 1 aaronjfentress 10 3
5 Topic 1 aaronrodger 20 2
6 Topic 1 aasmiitkap 30 1
7 Topic 2 aavqbketmh 10 1
8 Topic 2 ab 10 1
9 Topic 2 abandon 1 2
SQL to pandas: DENSE_RANK() OVER (PARTITION BY )
There is groupby.rank
:
esg_group_dist['rank'] = (esg_group_dist.groupby(['company', 'topic'])
['disance'].rank(method='dense', ascending=False)
)
However, looking at your entire query, it looks like you're trying to extract info where distance
is maximum minimum within each group. You can do so faster with
(esg_group_dist[['company', 'topic', 'probability', 'distance', 'sentence']]
.sort_values('distance') # sort values
.drop_duplicates(['company','topic'], keep='last') # keep the first rows
.query('topic=="green energy"') # filter topic
)
Note: to find minimum rows, remove ascending=False
and keep='last'
. Also there is groupby().idxmin/idxmax()
option`.
Pandas - dense rank but keep current group numbers
Replace 0
values to missing values, so if use GroupBy.transform
with first
get all existing values instead them and then replace missing values by Series.rank
with add maximal id
and converting to integers:
df = df.replace({'id':{0:np.nan}})
df['id'] = df.groupby('name')['id'].transform('first')
s = df.loc[df["id"].isna(), 'name'].rank(method='dense') + df['id'].max()
df['id'] = df['id'].fillna(s).astype(int)
print (df)
name id
0 Andrew 3
1 Andrew 3
2 James 1
3 James 1
4 Mary 4
5 Andrew 3
6 Michael 2
How to get top 5 dense ranked rows in Pandas?
You can create an array that has the 3 smallest values of your Score column, and then use isin
to filter your dataframe:
some_vals = pd.Series(df['Score'].unique()).nsmallest(3).values
df[df['Score'].isin(some_vals)]
Name Score
0 John 1
1 Mark 2
2 Perry 2
3 Dion 3
This way you ensure that all Name's that have a value in Score equal to any of the 3 smallest values will be returned back.
pyspark - Dense-Rank ties method first
The distributed nature of Spark
prevents implicitly identifying the order of appearance. If you input dataset contains a column like line_number
or row_number
then rank(method='first')
can be achieved.
Working Example
The following example relies on the dataframe from pd.rank
with a included Line_Number
field to have explicit ordering.
The dataframe is repartitioned to simulate random ordering after reading data.
import pyspark.sql.functions as F
from pyspark.sql import Window
data = [{"Line_Number": 1, "Animal": "cat", "Number_legs": 4}, {"Line_Number": 2, "Animal": "penguin", "Number_legs": 2},
{"Line_Number": 3, "Animal": "dog", "Number_legs": 4}, {"Line_Number": 4, "Animal": "spider", "Number_legs": 8},
{"Line_Number": 5, "Animal": "snake", "Number_legs": None}]
df = spark.createDataFrame(data).repartition(8)
window_spec = Window.orderBy(F.col("Number_legs").asc_nulls_last(), F.col("Line_Number"))
df.withColumn("rank", F.when(F.col("Number_legs").isNull(), F.lit(None)).otherwise(F.row_number().over(window_spec))).show()
Output
+-------+-----------+-----------+----+
| Animal|Line_Number|Number_legs|rank|
+-------+-----------+-----------+----+
|penguin| 2| 2| 1|
| cat| 1| 4| 2|
| dog| 3| 4| 3|
| spider| 4| 8| 4|
| snake| 5| null|null|
+-------+-----------+-----------+----+
Python Equivalent to SQL Rank
IIUC use GroupBy.rank
:
df['date'] = pd.to_datetime(df['date'])
df['rank'] = df.groupby('id')['date'].rank(method='dense').astype(int)
print (df)
id date rank
0 12 2021-06-01 1
1 12 2021-06-15 2
2 12 2021-06-21 3
3 34 2021-06-05 1
4 87 2021-06-19 1
5 53 2021-06-05 1
If datetimes are sorted per groups is possible GroupBy.cumcount
:
df = df.sort_values(['id','date'])
df['rank'] = df.groupby('id')['date'].cumcount().add(1)
Related Topics
Database Does Not Update Automatically with MySQL and Python
How to Convert SQL Query Result to Pandas Data Structure
Add Custom CSS Styling to Model Form Django
Data Scraping from Published Power Bi Visual
Rally APIs: How to Copy Test Folder and Member Test Cases
Difference Between Multiple If's and Elif'S
Python - Use List as Function Parameters
How to Install Python Opencv Through Conda
What Does "While True" Mean in Python
Pandas Extract Number from String
How to Serve Multiple Clients Using Just Flask App.Run() as Standalone
Best Way to Set Entry Background Color in Python Gtk3 and Set Back to Default
Optimizing for Accuracy Instead of Loss in Keras Model
Separate a Row of Strings into Separate Rows
Get Protocol + Host Name from Url
Multiprocessing Example Giving Attributeerror
How to Implement Custom Indentation When Pretty-Printing with the JSON Module