Video Transcript
Hello everyone, welcome back to This Not That where we compare a common mistake in the business intelligence world with a best practice. Today we’re going to be looking at stage 2 in the BI process, specifically the ETL (extract, transform, load) procedure.
Welcome back, my name is Hobbs and today we’re going to talk about stage 2 of the business intelligence process, cleaning your data. Specifically, a conversation I had with a manager during a Power BI training course I lead. This manager and an analyst came to the class together to learn more about the Power BI tool. In talking with them more about their unique business challenges, I learned that the manager had been frustrated with the analysts’ work because she was trying to interpret the code that he was using for his calculated fields. The difficulty she was running into is they were very difficult to read. Too often she was trying to verify if the data was telling her what she thought was.
My suggestion for the analyst was instead of trying to fit everything into a single formula, a single calculated field, break out each step and give it a clearly legible name. I call this best practice guideline, use explicit not implicit measures. Here’s an example:
Let’s say that you have a column of sales data. You’ve got the numbers going down one column and then next to it the items sold: apples, oranges, bananas, pineapples. Let’s say you want to know what percent of all your sales is made up of pineapples. Its not all that difficult to go through in a single calculation and say give me the total sales divided by sales where the label is pineapple. But what I would encourage you to do instead, because that’s an implicit way of defining these measures, be explicit about it. Go ahead and make a measure that says sales of apples and then make another one that is sales of bananas and then another one that is sales of oranges. And when you do your percent of total the formula is going to look like apples plus oranges plus pineapples divided by…and so on. This way instead of having to interpret what’s going on and reading through the code, you have very easy, legible, explicitly named measures to go back to.
Now like most guidelines there are some places where there are exceptions, so the exception in this case is if you’re doing something and you’re pretty sure no one is EVER EVER going to see it again, you don’t necessarily have to break things out into steps. However, you never know, you might move on to some other position or some other project and if you don’t want the person who comes in behind you to hate you for all of eternity, it is best practice to write things out as explicitly as you possibly can.
So a quick recap, make sure if you are writing measures that you are explicitly naming the individual steps instead of implicitly assuming that people will be able to ready what it is you’re writing.
If you enjoyed today's episode, we encourage you to follow us on social media for additional episodes. If you’ve got comments or if you’ve got a best practice that has been stewing around in your head that you would like the BI community to know about, we would love for you to comment that below. You can also check us out on our website.