Business Data Visualization - a fast and easy way

12 min readMay 27, 2023

Many organizations need to visualize their business data, which is often difficult to do. Thanks to the popularity of microservices architecture, data are usually stored in multiple databases. Executing queries for data visualization can be slow and causes high pressure on the business database - especially if we’re not using read replicas - thus slowing down the application. It is also challenging to link data together from different databases. The required data can be dirty, their structure and format may not be suitable for visualization.

There are several options available to solve this problem, some of which include:

Developing a custom analytics system
This entails creating at least a new service that retrieves the necessary data from the microservices - or directly from their databases. Then it has to normalize the data to create the required views for displaying the charts. Depending on the quantity and quality of the data, this can be a complex task. Additionally, UI needs to be created, which is capable of rendering the appropriate charts. The advantage of this solution is it can be highly customized, but it requires significantly more development time.
Cloud-based analytics system
Most of the Cloud Providers offer their own solution for data analysis and visualization. For example Azure Analysis Services or AWS Analytics and Data Lake Products. Using these services - depending on the provider - we can quickly and easily build our solution. The downside is that it might be more expensive and also be challenging to implement unique requirements.
Component-based analytics system
There are many pre-made services which we can combine for building our analytics platform. Essentially, this one is the combination of the previous two approaches. Cheaper than the cloud-based option and faster to do than building a custom solution. The individual building blocks can also be replaced with custom implementation as needed. This approach helps achieve the best balance between cost, effort, and development time. However, compromises need to be made among these three.

In this article, we are going to build a component-based analytics system. Each component can be run locally, on-premise or on cloud-based platforms. Additionally, these components can used in form of Software-as-a-Service.

Components

The following list includes the software components we are going to use:

PostgreSQL, Business and BI database.
MongoDB, Business database.
Docker, Containerization platform.
Metabase, Data visualization tool.
Airbyte, ELT system for copying data.
Data Injector, Scheduled job for inserting test data.

System Overview

The test data is stored in two different databases. The scheduled job will continuously inset new rows into them to simulates two microservices. The first database is a PostgreSQL, which stores the users of the system. The second one is a MongoDB, where the orders of the users and - for the sake of simplicity - the products are stored.

Our goal is to create a simple dashboard where we can display the total revenue from sales, the quantity of sold products, the current price of the products and their price changes over time on line chart. Additionally, we want to show the amount spent by users on each product.

The demo databases contain ten users and thirty products. The scheduled job will insert new orders - using random users and products - into this in every minute. Furthermore, the job modifies the prices of a few products to randomly generated values in each run. The source code for the job is available on this link.

The dashboard we build can be seen in the following image:

Platform architecture

In the first step, we transfer the necessary data for analytics from the business databases into the BI db. For this, we utilize an ETL/ELT system - in our case, Airbyte - which is capable to copy the data from different sources to the destination based on specified rules. Multiple alternative solutions are available to do this, such as Fivetran or Dataddo.

As a business intelligence (BI) database, we are using PostgreSQL in the demo, which is suitable for analyzing smaller datasets, typically up to a few hundred GBs. There are dedicated data warehouses available for real world analytics, such as Google BigQuery or Amazon Redshift. If the relational database becomes the bottleneck of our system, it is worth considering to migrate the system to these solutions.

Next, we have to transform the raw data into the appropriate format for visualization. We will use DBT (Data Build Tool) modelling tool, which can be integrated into most of the ELT/ETL systems.

After the data was loaded and transformed, we just need to visualize it. For this, we are going to use Metabase. The charts can be created using no-code editor or SQL queries with some additional configurations. We can then add these charts to the desired dashboard. We can use complex queries for the charts, although in most cases it is advisable to transform the data into the appropriate format using DBT (Data Build Tool) transformations. This is especially important when dealing with queries involving multiple table joins as the dashboard loading time can be significantly slower compared to a single-table query, where filtering conditions may not even be necessary. Metabase supports dashboard embedding into websites. The application can be run in containerized environments or we can also use through online service subscription.

Before we build the platform, let’s review the business data tables.

Users are stored in a PostgreSQL database. The table contains the user’s ID, name, and email address. The orders and products are stored in MongoDB collections. The orders document stores the ordered product’s ID, the customer’s ID, the timestamp of the order and a sub-document containing the order details. To make building the platform more interesting, we store the products in an Event Sourcing-based data table. For each modification, a new document will be inserted into the collection. By aggregating these modifications in chronological order, we obtain the current state of the products. In this example, we’ll omit the creation of snapshots from these aggregates.

Now that we understand the architecture, let’s start assembling the platform.

Deploy the Platform

To start the platform, we need running Docker. The compose file and the source code of the job can be downloaded from here. The components are defined in theairbyte-compose.yml and platform-compose.yml manifest files. It can be deployed by the following command:

The installation may take a few minutes as it needs to download the containers and build a new one from the Job source code.

Airbyte configuration

Once the platform has started, we need to configure the database connections. Airbyte UI runs on localhost:8000 url. The default username and password are airbyte/password. After login, we can add new data sources and destinations. First, let’s connect the Postgres and MongoDB business databases as sources, and the BI Postgres database as destination.
Here are their parameters:

Postgres Business:

Source database
Port: 10010
DB Name: postgres

MongoDB Business:

Source database
Port: 10020
DB Name: admin

Postgres Business:

Destination database
Port: 10030
DB Name: postgres

The username and password for the databases are admin/admin123. The image below shows an example of a filled form. We can finalise the connection by clicking on the Set up source/destination button. The destination database needs to be created only once, then we can reuse it.

After we have connected the databases, we need to specify the run frequency of the synchronization. For this demo, let’s choose the Cron type and set the value to */1 * * * * ? - copy-paste not always works. For testing purposes, it is good to have frequent synchronization, but in a real environment, this can be modified according to specific needs. We can select which tables do we want to synchronize from the source databases and set the synchronization mode as well. Let’s use Incremantal Append so only new data will be copied in each synchronization cycle. The value of the Cursor field has to be the primary key - id. Finally, on the Normalization & Transformation card we can specify what should happen to the data after the copy. For now, let’s use raw data option. The Normalized tabular data setting would try to restore the original table structure. It works quite well with relational databases but not as well with document-based ones. We will address this issue later using DBT.

The raw data tables should be in the BI database now.

In addition to the Airbyte metadata, the content of the business tables are represented in JSON format.

Each synchronized table contains three columns: a unique identifier, a JSONB column for the data and one holds the timestamp of the synchronization.

DBT transformation

In order to work with the data quickly and conveniently, we have to transform them into the appropriate format. For this, we will use DBT to automate the process. The scripts run after each data synchronization and transform the data applying the specified rules. The files used for the demo can be found here.

The goal of the demo is not to build up comprehensive analytic database or provide a detailed overview of DBT, but we will take a look at some examples to illustrate how transformations look in practice.

In the examples, we extract the price changes of products from the raw data table called airbyte_raw_* and create a new table that contains the historical information. From this table, we generate another one for represent the current state of the products. Finally, we create a view that includes the product names and the number of associated orders for it.

We extract the necessary data from the JSONB object and transform it accordingly. PostgreSQL has built-in functions for handling these JSON objects.
Using the is_incremental() macro provided by DBT we can ensure that only newly synchronized data are processed. In the configuration header, we can specify the indices for the table and define the unique key, which allows for updating existing records instead of insertion.

Historical data of products

After that, we generate the products table from the historical data. This table contains the latest price for each product. Since the job - after initializing product data including their names - only inserts the identifier and the current price of the products, we need to preserve the name from the first sync, while taking the price from the latest one.

Products table

After creating the tables for products and orders - which can be found following the link mentioned earlier - we also create a view that represents the quantity of sold products.

Number of sold items per product

As we have created the models, all we need to do is associate the transformations with an Airbyte connection. It can download the DBT project from a public Git repository and running it. The configuration can be seen on the following picture.

After saving - we have to save in the popup and on the card as well - the tables specified in the descriptors will be created during the next synchronization.

Metabase

In the last step, we will assemble the charts. Metabase runs on port 10000. After registration, we have to establish the database connection. Since Metabase and the BI database are connected to the same Docker network, we can use the container name as host.

BI database connection parameters:

host: bi-db-pg
port: 5432
db name: postgres
user/pw: admin/admin123

In the administrator settings - top right - on the Data Model tab we can see the available data models and set the field types. For example, we can set the type of the Price column in Products table as Currency, thus the values appears in the charts will be formated automatically.

Schema changes are automatically updated at certain intervals but if we want them to show up immediately, we can manually initiate synchronization on the Databases tab.

In this example, DBT creates all tables in the public schema. In a real environment, it is good practice separating intermediate and fact tables into different schemas and grant access to Metabase only the data it needs for visualization. Let’s leave the administrator panel now.

We can create new dashboards containing our charts using the New button. The built-in no-code editor - called questions - or SQL can be used for createing the individual charts.

But why do we need to transform the data if we could write the queries directly in Metabase? There are several reasons for this:

Working with cleaned and consolidated data is easier.
Queries on optimized dataset are faster, so as the rendering time of the charts.
Well-organized data allows even non-technical users to create their dashboards.

If possible, it is good to merge the necessary data into separate fact tables. This way, each chart can use a single table. However, it is important to be cautious as overly optimized data can also cause issues.

The table containing the quantity of products sold is a good example for this, as it does not allow filtering by timestamps.

Let’s create our first chart. By selecting the Question option after pressing the New button, we can choose the data source and the working table. In the no-code editor, we can join multiple tables for the selected one, group the data by columns, perform well known SQL functions, filter and sort the data, as shown in the image below.

By clicking on the Visualize button a table will be generated containing the results of the composed query. Here, we can select the chart type and also add other settings for it like goals, stack it and so on.

Available types for charts:

We can configure which columns to be displayed on the axes, break down the data based on selected columns and choose from many other options to customize the chart. By clicking on the Save button, we can finalize it.

Using the created charts, we can build up our dashboard where we are able to adjust their size and position.

We can also create charts using the SQL editor. Additionally, we can modify the code of the existing charts created by no-code editor.

We have the option to define filters for the dashboards allowing users to further refine the displayed results. The charts are automatically refreshed based on the selected filter values. Note that while filters work automatically for charts created by no-code editor, using SQL we have to manually add template variables for them. Detailed instructions can be found here.

Data used for the given chart can be also download in the selected format.

We can embed the dashboards into our own web applications. Additionally, we have the option to generate unique links to the dashboards via the Metabase API. This allows us, for example, to specify fix, unmodifiable filters for tenants, users… to only see their own data, but they can still use the dashboard’s filter for dates or product IDs and so on. Detailed instructions for this can be found here.

We have seen how easily we can visualize business data. Of course, it is not necessary to manually set up the entire environment. All the applications used here are available as online services, where we can configure our system within minutes. This option is faster but, of course, it comes with additional costs.