DAX Formulae For Creating Statistical Distributions in Power BI
Here I share the DAX code for creating various statistical distributions in Power BI
- Why create statistical distributions?
As a simulation engineer, one of my responsibilities is to create statistical experiments to simulate various scenarios and understand the outcomes. These outcomes have probabilities associated with them which help the decision makers to identify key risks, uncertainties and make informed decisions to leverage risk. For example, if the company wants to invest a million dollars in a capital project, there are many uncertainties involved such as labor costs, scheduling uncertainty, project delays, technical risks etc. A statistical simulation experiment, i.e Monte Carlo simulation, will assign probabilities with these risks and simulate thousands of outcomes to compute the probability of success. The company can then decide if it wants to invest the $1M or what decisions it can take proactively to de-risk the uncertainties.
There are many statistical packages and Excel add-ons available to create such experiments but none that I know of in Power BI. In Power BI, only uniform distribution and normal distributions are available natively in DAX. In this blog post I share the DAX codes I use to create different distibutions.
If you are new to Monte Carlo / Discrete Event simulation, these distributions may not make sense. In a future blog post I will show when to use these distributions and how to create sophisticated simulations in Power BI. The beauty of creating it in Power BI vs Excel is that the stakeholders can interact with the simulation parameters and understand the probable outcomes much more clearly. The DAX codes given here can be used to create calculated tables for simulations.
If you are interested in this topic, please also take a look at my other blog on generating random numbers in Power BI
LogNormal distribution with mean = 80, variance = 225
LogNormal = VAR _m = 80 // mean VAR _v = 225 // variance (not std deviation) VAR _phi = SQRT ( _v + _m ^ 2 ) VAR _mu = LN ( ( _m ^ 2 ) / _phi ) // scaled mean VAR _sigma = SQRT ( LN ( ( _phi ^ 2 ) / _m ^ 2 ) ) RETURN // scaled s.d EXP ( NORM.INV ( RAND (), _mu, _sigma ) )
Beta PERT with: Min likely value = 100 Med likely value = 300 Max likely value = 800
Beta Pert = VAR _a = 100 VAR _b = 300 VAR _c = 800 VAR _alpha = ( 4 * _b + _c - 5 * _a ) / ( _c - _a ) VAR _beta = ( 5 * _c - _a - 4 * _b ) / ( _c - _a ) RETURN BETA.INV ( RAND (), _alpha, _beta, _a, _c )
Probability of Choice 1 = 50% , Probability of Choice 2 = 30% , Probability of Choice 3 = 20%
Discrete = IF ( 'Table'[random_chocie] <= 0.5, "Choice 1", IF ( ( 'Table'[random_chocie] > 0.5 && 'Table'[random_chocie] < 0.8 ), "Choice 2", IF ( ( 'Table'[random_chocie] > 0.8 ), "Choice 3" ) ) )
Truncated Normal Distribution
Truncated normal distribution with: lower limit = 1 higher limit = 4 mean = 3 s.d = 0.9
DAX: truncatednormal =
VAR _a = 1 VAR _b = 4 VAR _mu = 3 VAR _sigma = 0.9 RETURN NORM.INV ( NORM.DIST ( _a, _mu, _sigma, TRUE ) + RAND () * ( NORM.DIST ( _b, _mu, _sigma, TRUE ) - NORM.DIST ( _a, _mu, _sigma, TRUE ) ), _mu, _sigma )
You may find below visual helpful in determining which distribution to use: