Monday, December 16, 2013

Tableau vs. Power Pivot Part 13: Many-to-Many Relationships

Today, we will talk about resolving slightly more complex many-to-many (M2M) relationships.  In case you missed our introductory post on this topic, you can find it here.  In this example, we have a set of sales orders.  Also, each order can be assign to one or more sales reasons.  This also means that each sales reason can be assigned to one or more sales orders.  The data set we are using comes from the AdventureWorks database.

First, let's look at these tables.
Many-to-Many Relationship
You  might ask, "Why can't we just join these tables together since we have keys?"  We'll show you why.
Total Due for 2006 with Sales Reason
This is the Total Due for all Orders in 2006.  Now, let's see what happens if we join the tables together.
Joined Total Due for 2006 with Sales Reason
As you can see, this total is much higher than the actual total.  Why?  When you join across a M2M, you introduce duplicates into your data set.  Now, on to the business question.  We want to know how much money is generated from each Sales Reason for each Month in 2006.  Let's see how Power Pivot would solve this.
Many-to-Many Relationship (Power Pivot)
In order to report on the Month, we needed to add a Date dimension.  The rest of the model remains the same.  Now, let's make our pivot table.
Total Due for 2006 by Month and Sales Reason (Power Pivot)
As you can see, the same value is replicated for each column.  The totals are way too high as well.  This is what happens when you try to query across a M2M.  Now, let's fix it.
Total Due (Proper) (Power Pivot)
Remember this formula from Tableau vs. Power Pivot Part 12: Introductory Data Modeling?  This is the magic formula that allows Power Pivot to query across a M2M.  Let's see the results.
Total Due for 2006 by Month and Sales Reason (Proper) (Power Pivot)
The grand total now equals what it was supposed to.  If we look at the totals for the first 6 months, we see the issue.  The Manufacturer and Quality columns were complete duplicates.  However, it doesn't matter because Power Pivot realized this and calculated the totals accordingly.  Now, let's see how Tableau deals with this.
Simple Many-to-Many Blend (Tableau)
As we saw in our previous post on this topic, blending can take care of simple M2M relationships.  Namely, if you are reporting on the same fields you are blending on, then the totals add up perfectly.  However, when you try to report on columns that you aren't blending on, you run into problems.
Many-to-Many Blend with Extra Dimension (Tableau)
As you can see, adding the Name field doesn't change the granularity, yet still breaks the totals.  This will be a major problem if we try to blend in this scenario.  For kicks, here's what it would look like.
Total Due for 2006 by Month and Sales Reason (Blend) (Tableau)
There are so many things wrong with this chart, we won't even talk about it.  Let's move on to the proposed solution.  Since we can't blend in the dimensions, we will need to join them in.
Total Due for 2006 by Month and Sales Reason (Naive Join) (Tableau)
The chart looks great, except that the totals are wrong.  In fact, we showed that this would happen already.  However, now that we have the dimensions, can we alter the measures to remove the duplicates?  To see this, let's break this chart up into 4 areas.
Chart Areas (Tableau)
We will refer to these areas by the following names: Red = Primary, Orange = Month Totals, Green = Name Totals, Blue = Grand Total.  Now, we can see that the Red and Green areas sum up correctly.  However, the Orange and Blue Areas are being inflated due to the duplicates.  So, the first step is to identify the duplicates.
Total Due and Rank by Sales Order and Sales Reason (Tableau)
Using a ranking method, we can see that every value with an Index greater than 1 is a duplicate.  However, we did this with a table calculation which would cause a mess of other issues.  So, we need to push this ranking method directly into the data source.  We can do this by adding a single piece to Custom SQL SELECT statement.

RANK() OVER ( 
     PARTITION BY [SalesOrderHeaderSalesReason].[SalesOrderID] 
     ORDER BY [SalesOrderHeaderSalesReason].[SalesReasonID] 
) AS Duplicate

This piece of code will give us the same values we saw using the Index, without using a table calculation.
Total Due for 2006 by Name and Month (Filtered Join) (Tableau)
By filtering out the duplicates, we were able to fix our totals.  However, doing this also removed those values from the Primary and Name Total areas, which is not what we wanted.  We want the duplicates in the Primary and Name Total areas, but don't want them in the Month Total and Grand Total areas.  Remember that each of these names also corresponds to an ID number?  We can use that to our advantage.
Total Due (Fixed) (Tableau)
Each value in the Primary and Name Total areas corresponds to a single Sales Reason Name.  Therefore, they must also correspond to a single Sales Reason ID.  Also, when you only have one value, then MIN() = MAX().  So, the first section of the IF statement only affects the Primary and Name Totals areas, while the second section of the IF statement affects the Month Totals and Grand Total areas.  Now, when we are in the Primary and Name Totals areas, we don't want to change our values at all.  However, when we are in the Month Totals and Grand Total area, we want to remove the duplicate.  The [Duplicate] = 1 statement returns True/False.  When you wrap a True/False value in the INT() function, you get a 1/0 result, respectively.  Therefore, we can remove the duplicate values by multiplying by this 1/0 result.  Now, let's see if this works.
Total Due for 2006 by Month and Name (Fixed) (Tableau)
As you can see, we have fixed our calculation.  However, this took a significant amount of work and knowledge, as well as a good amount of ingenuity.  Comparing this to Power Pivot, which was a 1 line measure, it was an easy decision.

Winner: Power Pivot

Thanks for reading.  We hope you found this informative.

Data Analytics Consultant
Mariner, LLC
brad.llewellyn@mariner-usa.com
http://www.linkedin.com/in/bradllewellyn
http://breaking-bi.blogspot.com

9 comments:

  1. Hi, Brad. First, thanks for using your time to write the posts. They are great.

    I have a problem related to your Many-to-Many Relationships article. Our company processes loan applications. As part of the result, an application can have observations such as "Requested amount higher than allowed by product".

    Basically, there are two categories of observations. The first group is related to the application itself: "Requested ammount higher than allowed" or "Repayment time must be 6 month or longer", etc

    The second group pertains to the applicants: "Income lower than the minimum" or "Age 65 or higher"

    There could be zero or more observations. In addition, if there are more than one applicants, there will one be one set of observations for each applicant.

    Various useful analyses can be done with these observations.
    1) Returns to the user the top 5 observations for each categories filtered by some dimensions (date, branch, etc)
    2) Analyze applications that have certain combinations of observations
    3) See the observations of one application

    I have been able to answer these questions using complex SQL queries, especially the first question. I wonder how Power Pivot and Tableau will help to simplify it.

    ReplyDelete
    Replies
    1. Tak,

      That sounds like an interesting problem. Any chance you could send me a mock-up data set so that I have something to work with?

      Delete
  2. Sorry, Brad. I did not see your response and didn't get a notification of it, so I totally forgot about the question. I was reading this article again today and found my question. My apologies.

    I will prepare the mock up data and the data model I created to present the data in Reporting Service.

    How can I upload attachments, in case is needed?

    ReplyDelete
  3. A simplified database schema
    Request table (1) -> Customer table (N) -> Observation table (N)
    (1) = One relationship
    (N) = Many relationship


    This is the fake data
    http://speedy.sh/EK944/Fake-data.xlsx

    The rows are repeated for each person and observation

    ReplyDelete
    Replies
    1. Something's wrong with the download. Can you email me the file?

      Delete
  4. job interview
    Hi! Thanks for the great information you have provided! You have touched on crucial points.

    ReplyDelete