I thought I’d put together a short series of posts with some quick and simple tips that we can all adhere to. The aim is to help us improve our DAX queries and the general speed of our Power BI and/or SSAS tabular models. In this first tip, I’d like to talk about the use of the datetime data type in your tabular and Power BI models.
Before we get to the tip itself, I’d like to talk about how tabular models work without getting too technical…
First a quick note on data compression
For those who have compared our source data size to the size of a tabular database or Power BI file, you’ll know that the data is heavily compressed.
This is one of the roles of the VertiPaq engine that runs Power BI and SSAS Tabular. You can find out more by having a look here.
In simple terms, one of the main methods used for compression is that the engine builds a data dictionary in the background and, in doing so, assigns a unique numerical value to each value it finds in a column.
For example, let’s say you have sales table with a million rows. Let’s say it has an order number, a customer number, and an item number. Maybe, over that million rows if you break down the three columns you only have 100,000 order numbers, 100 customers, and 10 items. The unique values in the three columns is not that many in the grand scheme of things. The VertiPaq engine doesn’t store each row as a unique set of values, it just stores the uniqueness of the values in each column.
This is obviously a very simplistic example. This is meant to be a quick tip after all, so I can save the VertiPaq engine for another time. However, knowing this helps us with understanding our quick tip!
Reduce the uniqueness
Now that you know the above, think about a datetime field. In effect, it is storing two pieces of data in one field. Both date, and time. If the “time” part of your datetime is always midnight, which can often be the case, then no need to worry as column is already as unique as it can be.
But, if you are importing an unusual datetime where the time is quite specific (see the image at the top of this page for some examples), then the number of unique values in that column will be many more.
As such, on importing our datetime column, we should split out the date and time portions into separate columns.
By doing so, we reduce the number of unique values in both the date field and time field; reducing the size of the database and making our filtering quicker at the same time.
Hopefully that has given you some food for thought on data model optimisation. Perhaps you can think of ways that you can make your model with other values than datetime.