Creating Subgroup Ranks in Power Query Using Table.AddRankColumn Function
Table.AddRankColumn is the latest addition to the M language. This post shows how you can use it to create subgroup rank column
Table.AddRankColumn is the latest addition to the M language which lets you create ranks in Power Query directly. No more buffer-sort-add index column trick which was computationally expensive and could be less accurate. With this function you can specify one or more columns to rank by and also the order (ascending/descending). This is not a replacement for the RANKX
function in DAX. If your rank order is not going to change or need it for validation purposes, this is a great function.
Read the official documentation for all the function parameters. But Reza Rad has the most comprehensive overview of the function with examples so definitely check it out.
In the example below, we have three products, their colors and the Units. Rank1
column shows the rank by number of units for all the products in the ascending order. What we are looking for is the subgroup rank, i.e rank within each product type as shown in Rank2
column.
import pandas as pd
df = (pd.read_clipboard()
.assign(Units = lambda s:s['Units'].astype('int'))
.assign(Rank1 = lambda s:s['Units'].rank().astype('int'))
.assign(Rank2 = lambda s:s.groupby('Product')['Units'].transform('rank').astype('int'))
.sort_values(['Product','Rank2'])
)
df
Here is my walkthrough of how you can create subgroup ranks in Power Query.
Important Notes:
-
As I mention in the video, this function is not query foldable so be sure to plan your steps to utilize folding first. If you don't know what is query folding, watch my in-depth session on query folding.
-
When you create the rank column, it will always sort the table by the newly created rank column. This may not always be desired. For example if your table is sorted by date or a key column when imported, that sort order will be disrupted when you create the rank column. I am not sure if there is a way to prevent that from happening. This also has an unintended effect that vertipaq compression may not be optimal. As you know, cardinality of the column matters in vertipaq compression but distribution of values matters as well. If the sort order is disrupted it may increase the size of the dataset. Be sure to check before and after.
-
I haven't tested this on a large dataset but be sure to check the refresh time and dataset size after creating the rank column.
-
Look at the
RankKind
parameter to specify how the ranking should be done.