Creating Custom Summary Statistics Table in Power Query
Table.Profile() provides descriptive statistics for the entire table. This method allows you to obtain descriptive statistics for every column based on a category or categories and also add custom statistics.
Introduction
A Reddit user posted a question on r/PowerBI forum few days ago about "... Table.Profile that respects slicers and deals with strings..." I answered the question by noting that it would be a easier to do it with DAX using SWITCH(). But coincidentally, I had a similar requirement for a project where the user wanted to get summary statistics for each numerical column and filterable by a slicer. The summary statistics in my case also included some custom functions. I could have created calculated columns, but Power Query offers a faster & memory-efficient solution. The reddit user had question about strings but I am going to focus on numerical columns only for simplicity. This is a quick way to get summary statistics for exploratory data analysis or checking data quality.
Table.Profile()
As this user mentioned in the question, the easiest and fastest way to get descriptive statistics is to wrap the table with Table.Profile()
. Read the documentation here. By default, Table.Profile()
returns min, max, average, standard deviation, count, null count, distinct count. But you can add more aggregates or custom functions by specifying the second parameter additionalAggregates as nullable list
. Lars Schreiber has an excellent blog post on using that second parameter.
In the example below, I added median and a custom function to count number of outliers in each column. This second parameter can be specified as ColumnName
#hide_output
(mylist as list, optional limit as number) =>
let
// Function to count number of outlier value based on how far it is from the average.
// By default values that are 2 standard deviations away form the mean are counted as outliers
// Sandeep Pawar
// PawarBI.com
// Date: 7/2/2020
mean = List.Average(mylist),
sd = List.StandardDeviation(mylist),
limit = 2,
outliers = List.Select( mylist, each _ > (mean + limit*sd)),
outcount = List.Count(outliers)
in
outcount
To use this custom function fxoutlier(list,limit)
in Table.Profile()
, I can write:
#hide_output
Table.Profile(TableName, {{"Outliers", each Type.Is(_, nullable number), each fxoutlier(_,null)}})