Metabase Data



Metabase is a good, popular open-source BI tool that anyone can quickly install on your local environment to get a simple BI system up and running.

Yet when your analytics needs grow, you might face some of these pain points when using Metabase:

  • Have your business users ever found Metabase’s “Ask a question” too limiting for complex queries, and end up coming back to the analytics team to ask for custom reports?
  • Metabase only works well with a single SQL data source. If you have data from multiple sources, Metabase likely won't work well for you since it doesn't allow joining of data.
  • With Metabase, you can only work directly with your database tables because everything in Metabase is designed for simplicity.

In this post, we'll share with you a few alternative options to that, and tell you which pain points of Metabase each of these alternative addresses. So depending on your needs you might be able to pick the right option to replace, or use alongside with Metabase.

Metabase Data Model

The below list of options are only tools that eventually offer a drag-and-drop interface to end users. At the end of the post we also include a list of tools that is only designed for technical users, SQL-to-chart translation.

Holistics is a nice BI alternative to Metabase. It works similar to Metabase in a way that it allows you to map your database tables into models and relationships, and expose this to the end business users to 'self-service explore'.

Similars/Difference to Metabase:

Metabase Data
  • Both are built on top of SQL querying engine and both offers a drag-and-drop experience for non-technical users.
  • Metabase fits only at at the visualization layer, Holistics offer additional ELT capabilities (data preparation).

Pros (compared to Metabase):

Prior to IIS 7, Microsoft's Internet Information Services stores its information in an internal database called the MetaBase.The metabase is an inheritable, hierarchical database that allows for configuration of HTTP/HTTPS, FTP, SMTP, and NNTP at the server, the site, or the folder or file level. Metabase is an open source business intelligence tool. It lets you ask questions about your data, and displays answers in formats that make sense, whether that’s a bar graph or a detailed table. Your questions can be saved for later, making it easy to come back to them, or you can group questions into great looking dashboards.

  • A stronger data modeling layer that allows handling sophisticated raw data.
  • Business users can ask more sophisticated questions using their Explore interface.
  • Works well with non-SQL data sources as they offer a mini-ETL experience with common sources (MongoDB, Google Analytics, etc)
  • Have an in-built DAG-like transformation layer so that you can transform raw data into aggregated datasets before exploration
  • Cloud-based so it doesn't take time to setup.

Both non-programmers and pro-level designers can take advantage of Metabase’s user-friendly data analytics feature. This BI tool has built-in queries and visualizations, so no SQL coding is needed to get results. But with Metabase’s SQL interface, designers can write their own custom queries and share the results with their teams. Metabase is an open-source BI tool, where you can store data, connect to outside data sources, query, and visualizing data. This article talks about when we have data stored in Metabase, how do we query the data and get results in Python. The first part of this article talks.

Cons (compared to Metabase):

  • Might not look as visually appealing as Metabase

Pricing:

  • Free, paid plans start from $50-$500 per month.

Tableau is considered the best tool when it comes to visualization (prettiness) as it's their primary focus. Tableau is also recently acquired by Salesforce.

Similars or differences to Metabase:

  • While Metabase translates everything to SQL, by default Tableau uses their in-memory datastore, making it more difficult to debug when things go wrong (you can't look at the SQL query to troubleshoot).

Pros (compared to Metabase):

  • Pretty visualization (best in their class)
  • Friendly for business users to build your own chartings
  • Work with a wide range of data sources

Cons (compared to Metabase):

  • To design charts effectively you need to use their Desktop version

Pricing:

  • Based on user roles (Creator, Explorer, Viewer) with mimimum commitment required.
  • They have a Free desktop version if you're publishing the reports publicly.

Coming out of Microsoft and with strong history of Excel and PowerPivot, PowerBI is a fine choice to replace Metabase. They also have ability to load custom visualization.

Similar/Difference from Metabase:

  • While Metabase translates everything to SQL, by default PowerBI uses their in-memory datastore and their proprietary language DAX, making it more difficult to debug when things go wrong (you can't look at the SQL query to troubleshoot).

Pros (compared to Metabase):

  • Their explorer interface is comprehensive for end-business users to work with.
  • They offer from loading data from multiple sources, to drag-and-drop transform UI, to visualization.

Cons (compared to Metabase):

  • We suspect if you're more inclined towards SQL-backed data reporting like Metabase, you might not like Microsoft-style, Excel-like, proprietary approach of PowerBI.
  • PowerBI Editor can only run on Desktop running Windows. (that's why we wrote a post on how to use PowerBI on Mac devices)
  • Their best practice require you to host your data into PowerBI servers, i.e duplicating your data into their server.

Pricing:

  • Free for single user (desktop)
  • $10 per user for small-scale shared resource deployment
  • Starts to get fairly expensive for medium-to-large scale deployment (starts at $5K USD a month - listed price on website).

Looker (now part of Google) is quite a good BI tool to replace Metabase, but only if you're a big organization with large budget.

Pros (compared to Metabase):

  • Have a custom-built DSL layer (called LookML) to perform mapping between database tables and business logic, thus it's more flexible and customizable.
  • Have an in-built transformation layer so that you can transform raw data into aggregated datasets before exploration
  • Cloud-based so it doesn't take time to setup (as compared to Metabase)

Cons (compared to Metabase):

  • Since they use their own DSL language to model data, it takes quite a learning curve to get started.
  • It's also expensive and meant for large-scale deployment

Pricing:

Metabase Data Model

  • Quite expensive, starting from $3000/month.

If you don't need self-service capabilities that Metabase offers for business users, you can also check out these tools:

  • Redash (open-source SQL to chart tools)
  • Cluvio (SQL to chart tool, paid offering with a free plan)
  • Superset (opensource, SQL to chart tool, coming out of Airbnb)
  • Mode Analytics (SQL to chart tool with paid offering)

How do you begin combining data from cloud applications with your internal databases to gain insight into your business? Maybe your organization has already standardized on Metabase as your analytics tool, but you're still learning about using it with multiple data sources.

To analyze data from diverse sources, you need a data warehouse that consolidates all of your data in a single location. Most businesses take advantage of cloud data warehouses such as Amazon Redshift, Google BigQuery, or Snowflake.

You can extract data that you have stored in SaaS applications and databases and load it into the data warehouse using an ETL (extract, transform, load) tool. Once the data is available, your analysts can use it to create reports. In this post, we'll look at how to start from scratch and create a report using Metabase, an open source business intelligence (BI) tool that's free to download and use.

Metabase's visual query builder lets you generate simple charts and dashboards, or you can use SQL to create more complex visualizations, as we'll do here. Each query starts by clicking a button to 'ask a question.' Metabase is simpler than tools like Tableau and Power BI, offering fewer features, but it's correspondingly simpler to learn.

Reporting, data warehouses, and ETL

Metabase can run in the environment of your choice via a Docker image, on AWS Elastic Beanstalk, or on Heroku, or you can run it as a native application on macOS or as a Java jar file. I used macOS version 0.31.1.

Per its FAQ, 'Metabase is primarily meant to work with actual databases.' If you want to analyze data in SaaS platforms, the developers 'suggest that you use other tools to build a data warehouse with the data you need.'

That's what we'll do with Stitch, a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. It can move data from dozens of data sources. Sign up and you can be moving data to a data warehouse in five minutes. Choose at least one of Stitch's integrations as a data source. Most businesses get more value out of their data as they integrate more data sources; for our example, we'll look at data from two separate sources.

Setting up a data warehouse

I used some of Stitch's real data to build a data visualization for this post. Specifically, I was curious to see how many support conversations came from each of Stitch's pricing tiers. To find out, I'd need to use billing data stored in Salesforce and join it with support conversation data from Intercom to create a visualization of support conversations.

My first step was to set up a BigQuery data warehouse to store the data from the two SaaS platforms. Following our documentation, I added to my Google user the permissions that Stitch would need to load data to BigQuery.

Using Stitch makes extracting data from a source and loading it into a data warehouse easy. I logged in to Stitch and added new integrations for Salesforce and Intercom, following the instructions in our documentation. From Salesforce I selected the Account table to replicate. Stitch's Intercom integration automatically extracts data from all available tables. From both, I chose the period from which to replicate data.

Once I'd set up my integrations I added BigQuery as my destination and specified a BigQuery project name. Within a few minutes, Stitch extracted the data I specified and loaded it into BigQuery.

Building a sample report

Next I turned to Metabase.

To start, I clicked on Metabase's settings icon, then Admin. I chose Add a Database and entered the names of my BigQuery project ID and other information, which I obtained from the BigQuery dashboard. As part of that process I had to generate an OAuth 2.0 Client ID and Client Secret — a simple process that involved entering some simple information on a screen connected to a Click here link. Similarly, I generated an Auth Code by providing my Google authentication information.

Once you have a database to work with, you can 'ask a question' in one of three ways:

If you choose Metrics, Metabase directs you to create segments and metrics from its admin panel. You can choose Custom for slightly more complex queries on a single table. I wanted to join data from two tables, but a Metabase blog post says:

Our goal with Metabase has always been to provide a way for non-technical users to answer their own questions in a self-serve manner. While joins are a great tool that a skilled analyst or programmer might reach for, we will be trying to add features that expose a highly specific, easily understood operation that someone who isn't SQL fluent would understand.

Metabase Database

Fortunately, we're SQL-fluent around here. I chose Native Query, which lets you paste in SQL code.

A look at the Intercom schema showed me that to associate the number of conversations with the Stitch customer ID, I'd need to join the Intercom Users and Conversations tables. The Users table contains a field called companies that contains information about the companies a given user is associated with. That field is a list that could contain multiple values – a nested data structure, in other words. Many data warehouses don't support nested data structures, but BigQuery does. It stores each list in an array.

Retrieving data from a nested data structure is tricky if you're used to working with fully normalized data. My colleague Erin, Stitch's senior technical documentation manager, set me on the right path with this query:

The SQL UNNEST operator takes an array and returns a table, with one row for each element in the array.

Metabase Data Source

To get the plan tier information from our Salesforce Account table, I created another query:

I used those those queries, along with data from Intercom's Conversations table, to retrieve the data I wanted, joining Intercom and Salesforce data. I used a case clause to consolidate data from monthly and annual billing plan into a single bar for each pricing tier:

Running this query in the BigData console gave me the data I wanted. Pasting it into Metabase gave me a table of data. To turn it into a visualization, I selected Bar from the Visualization drop-down and chose tier as the X axis. The result was a useful visualization.

Metabase lets you customize bar colors and axis labels. When you're satisfied with your visualization, you can save it in a collection and publish it to a dashboard that other users can share.

So there you have it – a quick walk through the process of using an ETL tool like Stitch to move data from multiple sources into a data warehouse, then report on it using Metabase. Sign up for a free trial of Stitch and start creating your own.