First, let's look at these tables.
|Total Due for 2006 with Sales Reason|
This is the Total Due for all Orders in 2006. Now, let's see what happens if we join the tables together.
|Joined Total Due for 2006 with Sales Reason|
As you can see, this total is much higher than the actual total. Why? When you join across a M2M, you introduce duplicates into your data set. Now, on to the business question. We want to know how much money is generated from each Sales Reason for each Month in 2006. Let's see how Power Pivot would solve this.
|Many-to-Many Relationship (Power Pivot)|
In order to report on the Month, we needed to add a Date dimension. The rest of the model remains the same. Now, let's make our pivot table.
|Total Due for 2006 by Month and Sales Reason (Power Pivot)|
As you can see, the same value is replicated for each column. The totals are way too high as well. This is what happens when you try to query across a M2M. Now, let's fix it.
|Total Due (Proper) (Power Pivot)|
Remember this formula from Tableau vs. Power Pivot Part 12: Introductory Data Modeling? This is the magic formula that allows Power Pivot to query across a M2M. Let's see the results.
|Total Due for 2006 by Month and Sales Reason (Proper) (Power Pivot)|
The grand total now equals what it was supposed to. If we look at the totals for the first 6 months, we see the issue. The Manufacturer and Quality columns were complete duplicates. However, it doesn't matter because Power Pivot realized this and calculated the totals accordingly. Now, let's see how Tableau deals with this.
|Simple Many-to-Many Blend (Tableau)|
As we saw in our previous post on this topic, blending can take care of simple M2M relationships. Namely, if you are reporting on the same fields you are blending on, then the totals add up perfectly. However, when you try to report on columns that you aren't blending on, you run into problems.
|Many-to-Many Blend with Extra Dimension (Tableau)|
As you can see, adding the Name field doesn't change the granularity, yet still breaks the totals. This will be a major problem if we try to blend in this scenario. For kicks, here's what it would look like.
|Total Due for 2006 by Month and Sales Reason (Blend) (Tableau)|
There are so many things wrong with this chart, we won't even talk about it. Let's move on to the proposed solution. Since we can't blend in the dimensions, we will need to join them in.
|Total Due for 2006 by Month and Sales Reason (Naive Join) (Tableau)|
The chart looks great, except that the totals are wrong. In fact, we showed that this would happen already. However, now that we have the dimensions, can we alter the measures to remove the duplicates? To see this, let's break this chart up into 4 areas.
|Chart Areas (Tableau)|
|Total Due and Rank by Sales Order and Sales Reason (Tableau)|
Using a ranking method, we can see that every value with an Index greater than 1 is a duplicate. However, we did this with a table calculation which would cause a mess of other issues. So, we need to push this ranking method directly into the data source. We can do this by adding a single piece to Custom SQL SELECT statement.
RANK() OVER (
PARTITION BY [SalesOrderHeaderSalesReason].[SalesOrderID]
ORDER BY [SalesOrderHeaderSalesReason].[SalesReasonID]
) AS Duplicate
This piece of code will give us the same values we saw using the Index, without using a table calculation.
|Total Due for 2006 by Name and Month (Filtered Join) (Tableau)|
By filtering out the duplicates, we were able to fix our totals. However, doing this also removed those values from the Primary and Name Total areas, which is not what we wanted. We want the duplicates in the Primary and Name Total areas, but don't want them in the Month Total and Grand Total areas. Remember that each of these names also corresponds to an ID number? We can use that to our advantage.
|Total Due (Fixed) (Tableau)|
Each value in the Primary and Name Total areas corresponds to a single Sales Reason Name. Therefore, they must also correspond to a single Sales Reason ID. Also, when you only have one value, then MIN() = MAX(). So, the first section of the IF statement only affects the Primary and Name Totals areas, while the second section of the IF statement affects the Month Totals and Grand Total areas. Now, when we are in the Primary and Name Totals areas, we don't want to change our values at all. However, when we are in the Month Totals and Grand Total area, we want to remove the duplicate. The [Duplicate] = 1 statement returns True/False. When you wrap a True/False value in the INT() function, you get a 1/0 result, respectively. Therefore, we can remove the duplicate values by multiplying by this 1/0 result. Now, let's see if this works.
|Total Due for 2006 by Month and Name (Fixed) (Tableau)|
As you can see, we have fixed our calculation. However, this took a significant amount of work and knowledge, as well as a good amount of ingenuity. Comparing this to Power Pivot, which was a 1 line measure, it was an easy decision.
Winner: Power Pivot
Thanks for reading. We hope you found this informative.
Data Analytics Consultant