Random Number Generation in Power BI
Quick post on some things to keep in mind when using the RAND(), RANDBETWEEN() in Power BI to generate random numbers
- Random Number Generation
- What are random numbers?
- How to generate random numbers?
- Tip 1: DAX and M RandomBetween behave differently
- Tip 2: Multiple Random numbers in Power Query
- Tip 3: You can use RAND() to create decimal random numbers between two numbers
- Tip 4: IMPORTANT Be careful with using ROUND() with RAND()
- Tip 5: DAX & M do not allow random seed
- Tip 6: Randomly subsample the data by using SAMPLE() in DAX
- Tip 7: Randomly generate letters or sequence of letters
- Tip 8: Changing column type to decimal
- Tip 9: Limit the use of calculated tables and columns with RAND() and RANDBETWEEN()
- Tip 10: Generating random dates
A quick post on creating random numbers in Power BI. While it's very easy to use, there are couple of things to keep in mind to get correct results. Typically the random numbers aren't needed in Power BI reports but I use them very often for simulations, statsitical functions and synthetic data generation.
What are random numbers?
As series of values is said to contain random numbers if all the values have equal probability of occurence and there is no correlation, i.e. the numbers are uniformly distributed.
How to generate random numbers?
In DAX, you can use RAND()
function to generate random numbers. Similarly, to generate random numbers between two numbers, use RANDBETWEEN()
.
In Power Query, List.Random()
creates a list of random numbers, while Number.Random()
creates a single random number. You can add an index column before using the Number.Random()
to get different random numbers on each row. You can delete the index column after the Number.Random()
step. There isn't a List.RandomBetween()
but Number.RandomBetween()
can be used along with index column to generate numbers between two specified values.
Tip 1: DAX and M RandomBetween behave differently
If you generate random numbers between (1,100) using DAX RANDOMBETWEEN(1,100)
, it will generate integer/whole numbers between (1,100) and will include numbers 1 and 100. In constrast to that, in Power Query Number.RandomBetween(1,100)
will generate decimal numbers between 1 and 100 and will exclude 100. Thus, if you are using M and want to include 100, use Number.RandomBetween(1,101)
.
To generate integer random values using M, first add an index column and then create a custom column using Int16.From(Number.RandomBetween(1,100))
. This includes numbers 1 and 100.
Tip 2: Multiple Random numbers in Power Query
To create multiple columns containing random numbers, add an index column each time before using the Number.Random()
. You can delete the columns later.
You can alse wrap the step before the Number.Random()
step with Table.Buffer()
to generate different number on each row. For example, if you added a custom column in the step previous to adding random number column, your step would look like Table.Buffer(Table.AddColumn(...))
. In this case you will not need to add an index column. If you need to add multiple random number columns, you can buffer the previous step each time. But be careful of the performance and transformation implications of Table.Buffer()
. Read more abou it here. Thanks to Gil Raviv for this tip..
Tip 3: You can use RAND() to create decimal random numbers between two numbers
As RANDBETWEEN()
creates integers, you can use RAND()*N
to create decimal numbers between (0,N). In this case, the numbers will not include 0 and N.
Similarly, to generate decimal random numbers between (N1,N2) use N1 + RAND() *(N2-N1)
. N1 & N2 are not included.
You can divide RANDBETWEEN()
by 10^i, to create numbers with 'i' decimal digits
Tip 4: IMPORTANT Be careful with using ROUND() with RAND()
As mentioned above, all random numbers should have equal probability of occurence otherwise it violates the uniform distribution assumption. If you use ROUND()
with RAND()
( e.g.ROUND(RAND()*50,0)
) to generate integers between 0 and 50, the end numbers will have half the probability of occurence compared to rest of the numbers. This is because numbers greater than or equal to 0.5 will be rounded to the next integer. e.g. 49.25 will be rounded down to 49 whereas 49.68 will be rounded up to 50. Thus, the end values have half the probability of occurence. The correct way is to use INT()
i.e INT(RAND()*50)
. While the error resulting from this will be minimal, it's better to use the correct formula when creating synthetic data to avoid bias.
Tip 5: DAX & M do not allow random seed
In Python and R, you can specify the seed number to generate reproducible random numbers. Both in DAX & M do not allow seed, so with each refresh new random numbers are generated. In many cases, this is not desirable. A work around for that is to generate random numbers in Power Query using M, indexing those and disabling the refresh for that query.
-
Create an index column and random number column using
Table.FromColumns( { {1..n_samples}, List.Random(n_samples)}, {"Index", "NRandom"} )
where n_samples >>> number of random values you need. For example, if you are going to need 3 sets of random numbers 1000 each, use n_samples >> 3000 ,say 5000.
- Uncheck "Include in report refresh" for that query
- Now you can use this list by using
FILETR()
in DAX by passingTable[Index] < 1001
and so forth
Tip 6: Randomly subsample the data by using SAMPLE() in DAX
I use this often to check robsutness/sensitivity of the simulation results with respect to the features used. For example, some columns may show correlaton or importance when used entirely but when sub-sampled may or may not.
To sample a column, use SAMPLE(sample_size, 'Table', 'Table'[Column])
Ruth provides a good overview of the SAMPLE()
function below:
Tip 7: Randomly generate letters or sequence of letters
You can use UNICHAR()
and RANDBETWEEN()
to generate random letters
To generate upper case latters: UNICHAR(RANDBETWEEN(65,90)
To generate lower case latters: UNICHAR(RANDBETWEEN(97,122)
To generate upper & lower case latters: UNICHAR(RANDBETWEEN(97,122) & UNICHAR(RANDBETWEEN(65,90)
In addition you can nest them with RANDBETWEEN()
to generate a random words, passwords, symbols etc. For a list of unicode chart, see this
Similarly, to generate characters using M, first create an index column and then use:
Character.FromNumber(Int16.From(Number.RandomBetween(65,90)))
for upper case letters. Thanks to Alex Powers for the tip.
Tip 8: Changing column type to decimal
When you create the custom column with random numbers using Number.Random()
in Power Query, it will be mixed type (123/ABC). If you convert it to Decimal, randomness is lost and all the numbers in the column will be the same. Solution to that is just add another index column ! You can always delete it later.
Tip 9: Limit the use of calculated tables and columns with RAND() and RANDBETWEEN()
The DAX Maestro Marco Russo has a blog on "volatile" DAX functions. These are the DAX functions that return differrent values when the model is refreshed. Using too many of these volatile functions in the model can cause slower refreshes. Use RAND()
AND RANDBETWEEN()
but use them judicously and be aware of the potential performance impact.
Tip 10: Generating random dates
To generate random dates, we can convert the random numbers generated by RAND()
AND RANDBETWEEN()
to date. For example, to generate random dates between years 2000 and 2030, we can use CONVERT(RANDBETWEEN(36526,45658), DATETIME)
. Note that in DAX 0 is 12/30/1899
, so just calculate accordingly. In Power Query, the equivalent would be Date.From (Number.RandomBetween(36526,45658))
.