Overview

When designing models and queries in Power BI dekstop, you often need to measure the individual dataset refresh duration. This is particularly important for large datasets where you are trying to optimize the queries to reduce the refresh time during development and in service. Matthew Roche from Power BI CAT team asked a question recently on Twitter about measuring refresh duration. Most pointed to Phil Seamark's Visualize Your Power BI Refresh blog and tool. I use that tool and it was the only way I was aware of. But Power BI user Eric Beery responded with an interesting solution that I didn't know about. In this blog I am exploring that solution.

All credit goes to Eric for this solution

Great idea. My current workaround:

Hovering over the table in the field list shows the date/time the refresh completed. So I use the difference in completion time between a small table (like date) and the large fact table in interested in to figure out how long it took.
— Eric Beery (@eric_beery11) January 6, 2022

Relative Refresh

Solution 1 : The Easy Way

Hovering over the table name

If you hover the table in report view or model view, you can get the "Data refreshed" date and time. Unfortunately it doesn't give you the data refresh start time so it's not possible to get the exact duration. We can get the relative refresh time by looking at when the refresh finished for a table with little/no data. For example, the small table below finished refreshing at 11:51:39 AM and the large table finished at 11:52:15 AM. So the refresh time for the large table was roughly 45 seconds longer than the small table. This obvisouly doesn't help us a lot but that's the basic idea.

Note: This assumes the refreshes happen in parallel. If you have disabled "Allow parallel loading of tables" under options, tables will be refreshed sequentially in which case this doesn’t help.

We can improvise this by adding a blank table by clicking on "Enter data" on the home tab. This will create a blank table without any rows. The idea here is that the blank table should take the least amount of time and thus should be close to the start of refresh time. We can use that to get a better estimation of relative refresh duration.

As you can in the image below, the empty table loaded at 12:51:25PM and the large table finished refreshing at 12:51:52PM, thus the large table takes roughly ~25 seconds to load. It took a bit longer in the first attempt because it was the first refresh. I am guessing when I refreshed again, the connection & data were already cached so it was faster.

Solution 2 : Selective Refresh Duration

Refresh duration for a specific table

As noted above, the limitation of this method is that we it won't work withour enabling parallel load. Also, even when parallel load in enabled, the query might wait for other queries to connect/refresh which will give false duration. We can get around this by refreshing only select few tables. This is not possibel in the report view, but if we switch to model view, we can CNTRL + Select the tables to refresh. This will provide slightly more accurate relative refresh time and we don't have to wait for all the queries to refresh. This is a better way when working with many tables or when you have a large table.

Solution 3 : Using DMV

Getting refresh times using DMV

Since this is all metadata, I thought I could query the DMV to get the same information. If we query TMSCHEMA_PARTITIONS using DAX Studio, we can get the same refresh times.

Note in this case, there are two times- ModifiedTime and RefreshedTime. ModifiedTime tells you when the table was last saved/modified and RefreshedTime tells you when the data was refreshed. It is Also note that the times are in UTC time zone and since we are using MDX, there is no way to convert it to local time zone in DAX Studio, at least I am not aware of it.

Using DMV gives us partition level refresh time. This helped in a project where I had to partition a massive dataset and wanted to know how well it was working. Querying the DMV allowed me to tune the size/number of partitions.

DAX Studio can connect to published datasets in service. So using this method, we can even check the refresh times for tables in a published datasets. Below I am using Python to connect to a published dataset and query the DMV.

import adodbapi as ado



conn = adodbapi.connect("Provider=MSOLAP.8; Data Source = 'powerbi://api.powerbi.com/v1.0/myorg/<workspace>';initial catalog='<dataset name>'")
cstring = "Provider=MSOLAP.8; Data Source = 'powerbi://api.powerbi.com/v1.0/myorg/<workspace>';initial catalog='<dataset name>'"
cur = conn.cursor()


with ado.connect(cstring) as con:
    with con.cursor() as cur:
        dax_str = ''' select [Name]
    ,[ModifiedTime]
    ,[RefreshedTime] 

from $SYSTEM.TMSCHEMA_PARTITIONS '''
        cur.execute(dax_str)
        data = cur.fetchall()

Convert the DMV data to a pandas dataframe

def get_dmv(data):
    '''
    Original script: https://stackoverflow.com/questions/56075159/convert-listadodbapi-apibase-sqlrow-to-pd-dataframe
    
    Modified to make changes to retrieve the DMV data
    
    Input: Pandas dataframe
    Return: Pandas dataframe with columns convrted to datetime 
    
    Author: Sandeep Pawar
    
    
    '''
    cols = pd.DataFrame(np.array(data.ado_results)).iloc[0]
    values =  pd.DataFrame(np.array(data.ado_results)).iloc[1:]
    for col in list(values.columns):
        values[col]=pd.to_datetime(values[col]).dt.tz_convert("UTC")
    values.columns = cols    
    df = values.T
    df.columns = ['ModifiedTime','RefreshedTime']
    
    
    return df

df = get_dmv(data)
((df[['RefreshedTime']]-df[['RefreshedTime']].min())/np.timedelta64(1,'m')).plot.barh(title = "Refresh Durations", xlabel='Datsets');

For the same dataset published to service, I deducted the refresf time of the empty table from rest of the times to get relative refresh duration. Yo can use SSMS/Tabular Editor to selectively refresh datasets (just like above) and get more relatively accurate refresh durations.

In the same twitter thread, Imke Feldmann showed that she has incorporated a similar approach in her Power BI Cleaner tool. Take a look at it below:

Same here - what a great find. 👍
I've created a report on that which will be included in the next version of my #PowerBI Cleaner tool.
Join the exclusive preview of it tomorrow in my chat with Reid from @HavensBI : https://t.co/78pwTQVhVv pic.twitter.com/H7D5aecL1Z
— Imke Feldmann (@TheBIccountant) January 6, 2022

Absolute Refresh

Solution 4 : Using SQL Server Profiler with "Visualize My Refresh"

The most accurate method

The most accurate and reliable way to capture refresh durations are by using the SQL Server Profiler and Phil Seamark's "Visualize My Refresh" report. Phil will explain it better than I can so watch below:

If you have any thoughts on improving these solutions, please let me know.