Monday, October 21, 2013

Tableau vs. Power Pivot Part 6: Performance

Today, we will look at how each of these tools handles very large data sets.  We are going to examine them on how compactly they store the information as well as how quickly they can retrieve it.  Unfortunately, we cannot give any information on the source of this data set, we can only say that it is an enterprise fact table with 53.5 million rows, 22 columns, and takes up almost 8GB as a csv text file.

Now's the time to note some limitations in the tools.  Tableau and Power Pivot both have limitations on the data they can connect to.  Power Pivot is unable to connect to any data source larger than 2GB.  This was a design choice by the developers in order to keep Power Pivot in the "Self-Service BI" freeware niche.  If you want to work with data sources larger than 2GB, you should upgrade to BISM Tabular.  BISM Tabular is basically Power Pivot all grown up.  It is a much more powerful version of Power Pivot that you would need to buy from Microsoft.

On the other hand, Tableau is designed as an enterprise tool as well as a self-service tool.  Therefore, when your file reaches the limit, which we think is also 2GB, then you still have the option of creating a "Tableau Data Extract."  This is a proprietary file format which includes columnar compression and storage, similar to Power Pivot.  Also, we have never seen a limit to the size of Tableau data extract.  The only limitation we have seen is of the processing power of your machine.

Now, let's talk about our examination.  We uploaded the data into SQL Server and typecasted the top ten million rows from strings into numbers to keep the size under the limitations of the tools.  Please note that these tests are done using a single laptop PC with 8GB RAM.  Your results may vary.  A recap table will be provided at the end of this post.

Category 1: Initial Data Upload

The first step to our process was to upload the data into our tools.  In Power Pivot, this involved loading the data into the Power Pivot window.  In Tableau, this involved creating a Tableau Data Extract.  Here are the times:

Power Pivot: 2 minutes
Tableau: 3 minutes 43 seconds

As you can see, Power Pivot uploaded the data much more quickly.  Data Upload is separated into two steps: Retrieval and Compression.  Power Pivot was about 30 seconds quicker at retrieving the data than Tableau was.  Moreover, Power Pivot compressed the data in a matter of seconds; whereas Tableau took about a minute to complete this process.  All in all, Power Pivot wins this part.

Winner: Power Pivot

Category 2: Data Refresh

Now that the data is in the tool, how quickly does it refresh?  Let's find out.

Power Pivot: 2 minutes 10 seconds
Tableau: 4 minutes 2 seconds

This is the same story as before.  No commentary needed.

Winner: Power Pivot

Category 3: Saving the Workbook

An important part of a business analyst's job is saving their work.  Let's see how these tools fare.

Power Pivot: 15 seconds
Tableau: 7 seconds

We're not sure why Tableau saves faster than Power Pivot.  It probably has something to do with the fact that Excel workbooks contain many more features than Tableau workbooks.  Regardless, this is an easy call.

Winner: Tableau

Category 4: Opening the Workbook

When you arrive at work the next morning, you need to be able to resume your work.  Let's check them out now.

Power Pivot: 15 seconds
Tableau: 4.5 seconds

Tableau takes this one as well.  This victory is probably caused by the same features as the quicker saving was.

Winner: Tableau

Category 5: Compressed Data Size

Now, let's look at how large these workbooks are when they only contain the compressed data.  This should be a good benchmark to compare the different compression algorithms.

Power Pivot: 247 MB
Tableau: 73 MB

It seems we have found the reason why Tableau saves and opens faster.  Its workbooks are much smaller.  This is likely due to the many extra features contained within Excel.

Winner: Tableau

Category 6: Tabular Chart Refresh

Next, let's see how quickly these tools can refresh a very large tabular chart.  Note that these charts were made using a Pivot Table in Excel versus a Text Table in Tableau.

Power Pivot: 22 seconds
Tableau: 43 seconds

We see that Tableau is much slower at refreshing this table than Power Pivot.  This must be the performance that everyone talks about when they talk about Power Pivot.

Winner: Power Pivot

Category 7: Bar Chart Refresh

Finally, we're getting to the good stuff.  We saw that Excel was fast at creating Pivot Tables.  The question is, "Will Power View fare as well as Excel?"  Let's see how it goes.

Power Pivot: 6.5 seconds
Tableau: <1 second

The Tableau chart loaded so quickly that we couldn't even time it with a stopwatch.  This is easy to call.

EDIT: It should be noted that the timing methods were somewhat different.  We could not find a reliable method for refreshing Power View and Tableau from scratch.  Therefore, we chose to open up the workbooks to a blank Tableau worksheet and a blank Power View worksheet.  Then, we opened up the sheet we wanted to display.  The time it took for the populated sheet to open is what we recorded here.  The same logic applies to the dashboard refresh as well.

Winner: Tableau

Category 8: Dashboard Refresh

Lastly, we get to the main event.  How quickly can these tools load a complete dashboard.  Note that the dashboard consisted of four varying types of charts created to stretch the tools.

Power Pivot: 9 seconds
Tableau: 3 seconds

It seems that Tableau has won again...Wait a minute!  Notice how the difference between times for Tableau and Power Pivot for the Bar Chart and the Dashboard are 6 seconds for each tool?  Turns out, this is actually caused because switching between Power View sheets in Excel causes a little bit of overhead.  If we remove that 6 second overhead, the tools seem to fare equally well at this point.  Alas, winning is winning.

Winner: Tableau


Summary

Let's take a look our results to see how the tools matched up.
Performance Results
Looking at quantity of category wins, it seems that Tableau is the winner.  However, if we look at the categories that really matter, Initial Data Upload and Data Refresh, Power Pivot takes this one.  This puts us in a bit of quandary.  Which of these tools is the winner?  If we consider our original idea of the 53 million row data set, we see that Power Pivot was completely unable to deal with this quantity of data, while Tableau had no issues at all.  So, we can finish with a two-part decision.  If you are using small data sets by yourself for a side project, you will probably see better results with Power Pivot.  However, once you reach a certain point, you must use a different tool.  This means that for larger data sets, Tableau wins by forfeit.  Thanks for reading.  We hope you found this informative.

Brad Llewellyn
Associate Data Analytics Consultant
Mariner, LLC
brad.llewellyn@mariner-usa.com
http://www.linkedin.com/in/bradllewellyn
http://breaking-bi.blogspot.com

10 comments:

  1. Hi Brad,

    I'm not sure I agree with your analysis. You state the 'categories that really matter' are data upload and refresh - this isn't true.

    Data upload is a one time task - which can often be passed off to a server (Tableau server could have done the refresh over night for example, so I experience zero load time).

    What's important is performance on tasks that you do repeatedly. I would expect that in building a dashboard, the user may go through 500-1000 iterations of a chart or series of chart. Waiting 6 seconds for each iteration is not even close to acceptable, <1 second is the only acceptable answer for a task with very high frequency.

    Cheers

    Tom

    ReplyDelete
    Replies
    1. Tom,

      I agree with you 100%. I apologize for not being clear enough. That six second delay was on the initial loading of the charts. It's an overhead caused by starting up Power View, which doesn't open up automatically when you open Excel. Once you are inside the dashboard, the response time is just as fast in both of the tools. Thanks for commenting!

      Delete
  2. So how many rows was your test in the end? I see about a three second response time to most actions with 150M rows in a Tableau extract (on a similar 8gb laptop).

    To test this performance in more detail, we need to ask more complex questions like count distinct etc.

    I'd encourage you to do a more comprehensive review of the performance of both tools, I'd love to see it!

    Also, perhaps the next test could be on the mapping capabilities of each tool? This is a specialism for us at The Information Lab, so we could offer a lot of insight into the capabilities of Tableau from this perspective. We'd be happy to contribute to the Tableau half of the review if you decide to do something on mapping.

    ReplyDelete
    Replies
    1. Tom,

      Thanks so much for your interest. I think the data set ended up being like 15 million rows and 10 columns. The Mapping portion is a great idea as well! We'd already planned on doing a post about mapping, perhaps we'll see how the performance matches up across these tools as well.

      I also agree with your statement that this test was very basic. If you, or anyone else would like to propose a more comprehensive test plan, we're more than happy to hear it. Also, if you were to create your own performance analysis, we would be happy to link to it in one of the posts. We've really learned quite a bit about both of these tools during this process and we're eager to keep testing and seeing what comes out.

      Thanks!

      Delete
  3. I think that Tableau takes more time upload and refresh data partly because the compression ratio is higher (it would make sense as more computation might be needed).

    Do you know about any incremental refresh in PowerPivot?

    I am really surprised it took twice as much time to show the tabular chart with Tableau. It is nice to know this limitation. Could you tell how was the tabular structured? How many rows? Columns? Dimension? Measures? Showing totals or not? Were you using the OpenGL mode for the benchmark?

    ReplyDelete
    Replies
    1. Damien,

      I think the text table had two dimensions on the rows, which created a few thousand rows and 4 different measures across the columns. All other defaults were kept in both tools. We did not use OpenGL.

      As far as the incremental refresh goes, we've found a fair number of complaints that Power Pivot does not have incremental refresh capabilities. Tableau does have incremental refresh, but it is an "Insert-Only" refresh.

      Delete
    2. Thanks for the reply.

      I just found out about this white paper about designing efficient workbooks: http://www.tableausoftware.com/learn/whitepapers/designing-efficient-workbooks

      Quite interesting to improve Tableau workbook performance.

      Delete
    3. Thanks for sharing this! It was a very interesting read.

      Delete
  4. Brad, I checked PP capacity specification. Here is it: http://technet.microsoft.com/en-us/library/gg413465.aspx. Are you sure PP can't handle big amounts? Sure, we are talking about XL/PP x64.

    ReplyDelete
    Replies
    1. I think there may be a difference between x64 and x86. All I'm sure of is that my PP failed because it the file was bigger than 2GB. I remember hearing from someone else that x64 removed this limit. Perhaps this is what they were talking about?

      Thanks for sharing!

      Delete