Reverse engineering summarised data
Or how to make a proper fact table with what you have...
Those who use Power BI, Analysis Services, or any other data modelling tool that uses DAX, will know that most of the time you will be taking very detailed data and in one way or another turning it into useful summarised data, as such is why we use reporting or dashboarding tools.
In most cases the source data you will have is relatively detailed, making it easy to create a data set that can be sliced by a particular data range, or a chosen dimension. But what do you do when the data you get isn’t detailed enough? I was presented with one such predicament. The user wanted to be able to add summarised budget data alongside detailed data into their data set, and then be able to slice by varying periods. The only problem was the way the data was available.
Usually, at the very least, for budget data you would be provided with a transaction per month. As you can see from the same data above, the data was provided as a single row (per customer and role) with a start and end date and then a budget for that date range. This didn’t lend itself to being able to produce a data set that would be useful to the user and their report or dashboard. I needed to somehow break the data down into useful chunks.
The reporting solution I’m using here is Power BI, but because I’m using DAX, the principle could be applied to Excel (using the data modelling functionality) or Analysis Services Tabular.
There are three logical steps needed to get down to the detailed level.
Step 1. After I’ve imported my budget data which for this, we will call BudgetTable, I need to calculate how many days there are between the start date and end date. That is, calculate the duration of each budget line. Create a new calculated column in your imported table.
Duration := ( [Roll End Date] - [Roll Start Date] ) + 1
That step is pretty straight forward. Due to the way the date field is stored as a datetime, with 0:00 as the time, the Roll End Date wasn’t being counted as a full day, so I forced an extra day on their which is why there’s a + 1 at the end of that formula.
Step 2. You’ve worked out the duration, you can now convert that into a daily budget as a new calculated column. For the sake of simplicity I’m taking the full duration calculated in the previous step. You may want to remove weekends or public holidays from your duration to work out a true number of working days, which could be achieved with a good date dimension table.
DailyBudget := [Budget] / [Duration]
Step 3. Now that we have the more data that wasn’t in our imported data, we need to create a whole new calculated table based on this data that’s been imported and created (with the two calculated columns). The following assumes you’ve already got a date dimension or calendar table of some sort that has each date. It won’t work without one.
Then I’m going to use the Generate function in my table. The generate function creates a table with the Cartesian product between two sets of data. In this scenario, I want to create a table that gives me an entry for every instance of customer and role from the budget table, and individual date from the calendar/date table, but only within the confines of the date range of each budget start/end date.
Clear as mud, right?
So now I create a new table with the following DAX code.
NewBudgetData := SELECTCOLUMNS ( GENERATE ( BudgetTable, FILTER ( ALLNOBLANKROW ( 'Calendar' ), AND ( 'Calendar'[Date] >= BudgetTable[Role Start Date], 'Calendar'[Date] <= BudgetTable[Role End Date] ) ) ), "Date", 'Calendar'[Date], "Customer", BudgetTable[Customer Name], "Role", BudgetTable[Role], "Budget", BudgetTable[DailyBudget] )
I can then hide the original BudgetTable and the report can be based on the NewBudgetData table which will have the required detail to filter to a particular day, week, month, or whatever date range you require.