Selecting Columns by Data Type in Power Query
This blog shows how to select columns by data type. This is helpful if you have hundreds of columns and want to see, for example, only numerical or date columns.
It's not uncommon to have a table with hundreds of columns with different column types such as numerical, text, date, percentage etc. You can select columns by using the UI option "Choose Columns" and then selecting which columns to keep. But if you have have tens of columns of the same type, you will have to manually go through the list of columns and select the columns to keep. As far as I know, there is no direct way to only keep, for example, numerical columns. In Pandas, you can use select_dtypes
method and pass which columns to choose based on the data types. No such option or function is available in Power Query.
In this example, I will show how I achieved it. You can customize it or turn it into a function based on your use case.
I am using two M functions to achieve the desired output - Value.Type
and Type.Is
.
Value.Type
returns the data type of the value. For example, Value.Type(123)
returns number
as type, while Value.Type("sandeep")
returns text
as type. There is not much about it in the official documentation here
Similarly, Type.Is
is a logical function that you can use to get a True/False if the type equals defined column type.
For example, to check if number 123 is a text you would write Type.Is(Value.Type(123), Text.Type)
. This will return FALSE
while Type.Is(Value.Type(123), Number.Type)
returns TRUE
. Here we first obtained the type of the value using Value.Type
and then compared it with Number.Type
. Since both match, the result is TRUE
. You can see a list of all data types in M here.
We can combine these two functions with parameters to dynamically select columns based on selected data type.
First create a parameter list as shown below. Include all data types you want to select. In this case, I am only selecting text, numeric and date type. I have also included 'All' to return the entire table.
You can see the code below, but at a high level here are the steps:
- Create a parameter list
- Ensure all columns have correct data types assigned. This is important because that's how M will idenify the columns to select
- Unpivot the table to create two columns - 'ColumnName' & 'Value'
- Use
Value.Type
andType.Is
and create a new column that shows the data type of the value - Link parameter list with the filter to select data type
- Get filtered column list and pass it to
Table.SelectColumns
You can see it in action below:
To apply this to your table, you only need to make sure SourceTable2
is your table with all columns with correctly assigned data types. You could convert it to a function to take table and data type as arguments. I will leave it to you.
While thinking more about ways to make this function better, I actually found two more ways. One is to use Table.ColumnsofType
(documentation) and other is to use Table.Schema
. Table.Schema
gives you metadata of the entire table along with the column types. I can use the same approach as above, but just use Table.Schema
instead to get data type of each column. This is faster, better, cleaner!!! Here is the code: