data discovery

Is Tableau the New Netscape?

Tableau has done for the discovery of data what Netscape did for the discovery of information, with the first web browser – empowered the masses. For data discovery, Tableau makes it simple to connect to some data, slice and dice it, and create some cool visualizations. It more than satisfies a simple equation for a software product:

Love = Results – Effort

That is, if the results for your users are way larger than the effort they put in, you have a winning solution: and Tableau kills it. Tableau’s timing was perfect, end user empowerment, the proliferation of data, just at the same time traditional command and control analytics was reaching a user frustration tipping point. Tableau provides an incredibly level of interactivity to “play” with the data, without requiring IT.

And there is one other timing aspect that Tableau has continued to capitalize on: a sustained vacuum of analytics vision from Microsoft, because they'd been asleep at the wheel around analytics. For a long time, Pivot Tables and Microsoft Analysis Services were the last great analytics innovations from Microsoft, and those introductions disrupted vendors (I worked at a vendor on the receiving end, and it sucked). But after those introductions, it has been a nuclear winter. That absence enabled Tableau to spawn a new industry – empowering users to explore data, and to thrive.

The Browser Wars of the Mid 90s

Similarly, when Netscape first appeared, with the growth of the Internet, Microsoft was essentially asleep at the wheel too. At the peak, Netscape had an 80%+ share of the browser market. Fearful that Microsoft was late to the Internet, Bill Gates led the led the call to arms with a letter to focus on the tidal wave. One of the areas: Netscape. The strategy was to put their full weight on changing Netscape’s dominance, with (love it or hate it) - Internet Explorer. Netscape quickly lost share as IE simply became the default - dropping to less than 1% share by 2006.

Netscape's Share of the Browser Market from 90's to 00's

Netscape's Share of the Browser Market from 90's to 00's

Gate’s Internet is Nadella’s Cloud and Data. One of the cornerstones of Microsoft’s strategy is not just cloud, with Azure (which now is second only to AWS) – empowering developers to create cloud services, but also tools and services to empower users to work with data.

The announcements around analytics have come quick and fast, PowerBI; PowerBI Desktop; PowerBI Mobile; PowerQuery; Azure Stream Analytics; Azure HDInsight; Azure Machine Learning; and Cortana Analytics.  For the PowerBI suite, the price is right - PowerBI is free, and PowerBI Pro is $9.99 per user per month – where you get more data, more refreshes, on premise connectivity, and more collaboration features.

The Coming Data Discovery War

So I tried out the web flavor of PowerBI a few months ago, bringing in some data from Salesforce into a prepackaged web dashboard, and it was cool, but to be honest the results were too limited – you couldn’t really play with the data enough. Definitely a threat to some cloud dashboard providers, but no threat to Tableau for real empowered data discovery. It’s more for consumption of analytics, but not playing with data. It fits into a data discovery framework, but isn’t the whole solution.

Fast forward to last week, where I tried out PowerBI Desktop. PowerBI Desktop is basically the equivalent of Tableau Desktop. And the interplay is similar, where users create rich analytics with the client, and then publish to the web to share the results.

But what blew me away was how PowerBI Desktop stacks up....

Let’s start with the data sources. They’ve done a great job of adding a huge number of sources – the usual suspects like Excel, text files and database sources, but also supporting a wide range of big data sources, social sources, ERP and CRM sources etc. It looks like they’re working with ISVs to add sources at a frightening rate. Getting access to data is often one of the big stumbling blocks for data discovery (and I think one of Tableau’s weaker areas) – and it looks like Microsoft is really focused on cracking the code here.

So then I thought I’d get my hands dirty and give it a little test drive with my favorite old time schema – Northwind (which I was pleased to see Microsoft still use for on-stage demos!). It’s a relational schema, and PowerBI Desktop did the automapping for me, then enabled me to easily make some changes to the joins. Nice and straightforward and very usable, and easy to visualize the relationships.

Finally, for the really fun bit, some data discovery. And this is where it was shockingly good. From soup to nuts, from data to dashboards, I built the quick example below in about 20 minutes. And it checks all the boxes. On the right is an easily field selector, there’s a rich array of visualizations – traditional charts, heatmaps, gauges, geospatial charts (more visualizations can be added by third parties) etc. All of the visualizations have strong data flexibility, so I could easily change the data that I’m seeing in then chart, filter it, use TopN/BottomN etc. I found myself easily slicing around the data, trying out different views, just like Tableau.

Some of the cooler stuff is how the dashboard components automatically snap together, with no effort at all, so for example, when I click on a region on the map, my other charts automatically orient, and it’s easy to create a book of dashboards, calculated measures etc.

Oh, and publishing is simple too.

So, is Tableau the New Netscape?

Which brings me back to the comparison at the start of all this. PowerBI Desktop does what 90% of people need to do with discovery tools, and it’s free, and nicely integrated with Office. So why use Tableau then? Sure, Tableau is still better in some areas for sure – more visualizations, it chooses the right chart automatically, Mac support, and I’d say it still has a slight edge in intuitiveness for data discovery. But here’s the kicker, Tableau is 10+ years old, PowerBI is 1.0 – and it’s tying into Microsoft’s broader strategy around Azure, Office365, and Cortana. Brutal.

I’m sure there’s chatter going on in the halls of Tableau on PowerBI. But to be sure, the threat from PowerBI perhaps means considering additional options around predictive analytics, or moving towards an applications strategy beyond tools.

Of course, if I were to take the Netscape analogy to its ultimate ending, out of the ashes of Netscape rose Firefox – which came to haunt Microsoft. I’m not sure this story will end in the same way.

Data Discovery: Warning Batteries Not Included

There were few things worse than the Christmas disappointment of frantically tearing opening a present to find out it was dead in the water - no batteries. Worse still, back in the day when I was a kid, there weren’t any stores open on the day. So in the absence of some forward planning with some on-hand batteries (usually unlikely), it meant a grindingly slow wait until the following day to get some satisfaction. From anticipation to disappointment in a few short seconds. These days toy manufacturers are smarter – they’ll just include them, thankfully.

Sometimes software can be prone to the same issue, and most recently data discovery tools in particular. Data Discovery has been one of the fastest growing segments within analytics, growing substantially faster than traditional Business Intelligence counterparts. And with good reason, data discovery adoption typically starts as a bottom up business user driven initiative. Adoption starts with a frustrated and enterprising analyst looking to explore or share some insight, caught between spreadsheets, and the absence of a useful (or existent) analytics initiative (which is usually too costly, too rigid, or just sat on the shelf), data discovery just makes sense to get success quickly.

The great thing about data discovery tools is they provide near instant short term satisfaction. From quick and easy setup, through to data visualization and exploration capabilities - from easy ad-hoc analysis, to cool geospatial visualizations and heat maps. With tools like Tableau you can get eye catching results incredibly quickly against spreadsheets, or connecting to a database, or cloud source like Salesforce. A business user can typically go from data to dashboard significantly faster than traditional Business Intelligence tools, because those tools require complex mappings and semantic layers, and require IT setup, before getting any joy.

In contrast to traditional BI tools, they eschew centralized data integration, metrics layers, and IT maintained business mapping layers. That’s the unglamorous stuff, that once it’s all done (which takes a lot of time!) is often too rigid to accommodate new ad-hoc data requirements, or perhaps misses the mark in terms of helping answer what analysts need asked when the need arises. The simple fact is that it is difficult to design an analytics initiative a priori – because you don’t necessarily know all the questions analysts will ask. It’s why data discovery has been so successful and been adopted so quickly.

What About Those Batteries?

It’s true, setting up all of that data integration, and semantic layers for users to interact with slows traditional BI deployments down. Also, having to prepare data, or optimize database schemas to get decent query performance, well that’s just plain thankless. Analysts just want to answer the questions they have, right now. And all of that plumbing just gets in the way of speed and autonomy.

So data discovery tools typically dispense with all that, but in doing so, they throw the baby out with the bath water – and there are consequences. Their value proposition is simply to point the tool at a spreadsheet, a text file, or a simple data source, or perhaps a cloud source like Salesforce, and start analyzing. The problem is that life in the long run is rarely that simple. And that nice shiny demo of the product often had hidden the real data integration complexity that it takes to get to that place. Because often even spreadsheets and text files need cleansing, opportunities or accounts in Salesforce need de-duping. Never mind perhaps joining together accounts across CRM or ERP systems. Or perhaps resolving complex joins across multiple tables (or databases). In emphasizing speed and autonomy, what’s lost is reuse, repeatability, and sharing clean data.

It’s Like Making a Battery Run to the Store. Daily.

What often happens, especially when data discovery tools get virally deployed across departments, is that IT, or the administrator of the data-sources (e.g. the Salesforce or ERP admin) in question often get left carrying the bag. It means repeated requests for an ad-hoc data extract, or for the analyst repeatedly grabbing an updated extract and then try and join it with other sources and cleanse it in spreadsheet hell. Over, and over again.

The organization turns into a culture of one-offs – a one-off extract for a few periods of data for some win-loss analysis, another extract for some product discounting analysis.  Analysts may end up performing weekly or monthly data prep and cleansing, just for their own activities, with no shared benefit for the rest of the organization. The business ends up with multiple data silos, and a lot of redundant effort. Multiple versions of the truth get created with every data discoverer using his/her own logic to cleanse and transform the data, and visualize.

Everyone ends up with cool visualizations to share (and impress the management team with!), but the organizational cost is high, with wasted time and redundant sets of conflicted data.

But things can be different with a little planning ahead.

Three Steps to Building a Batteries-Included Approach to Data Discovery

1)     Create a sustainable Data Discovery strategy

I’m not advocating building old school centralized BI (though it does have a role as part of a broader analytics strategy, more later) because data discovery tools fill a need to understand and explore data quickly. But organizations need to create a strategy around data, and encourage sharing of not just dashboards, but data too – to optimize for more reuse. So when the organization hits an inflection point in data discovery adoption, there is readiness to roll out user driven data prep tools like Paxata and Alteryx. These tools provide relief in terms of enabling business users not just to prepare their own data, and also automate common preparation activities, but to share it with others too. The outcome is shared pools of data that have been refined to handle common business questions. And better yet compared to traditional data warehouse initiatives, when data is prepared from the bottom up, and shared, you’ll often ended up with much more pragmatic and useful data to handle real-world business questions, based on a more democratic (and continually improving) process for improving the data pool.

2)     Identify data sources that need to be frequently analyzed and optimize for re-use.

One of the other keys is to identify which data requests have moved into inefficiency and dysfunction. For example run a quick poll amongst apps administrators, such as asking the Sales Ops Salesforce or Dynamics GP admins which data pulls for business users have become onerous. Perhaps there is a month end extract from multiple ERPs that requires merging continually every month, that's sucking up cycles in finance or ops. It’s also worth polling analysts to understand what kinds of recurring transformation and merging they’re performing – and which ones are duplicated across team members. The answers to these questions reveal what data tasks are candidates to be consolidated across teams or are opportunities for automation.

3)     Think Holistically about Analytics, Create a Journey

As we've seen, while laissez-faire based adoption of discovery tools can quickly create results quickly, it’s often not sustainable as adoption scales up. The truth is that there typically needs to be some ownership and data stewardship. In mid-size organizations it may mean an analytics strategy led by finance, perhaps consisting of using analytics that's embedded with the transactional apps, some centralized BI/reporting (for hardened shared metrics and reports), collaborative data pools, and data discovery tools. In larger organizations, it’s a prime area for IT to lay the foundation to support a sustainable bottom up data discovery strategy.

So before you go out shopping for that shiny new data discovery tool for the holidays, and think about rolling out across your organization, consider stocking up on batteries first, so your team will spend more time playing with visualizations, and less time stepping over each around data.