Snowflake’s new Unistore workload and Hybrid Tables represent a seismic shift in database technology. For the past 40 years, data has been stored in independent data silos with transactional data (also known as Online Transaction Processing systems or OLTP) used to support operational workloads and analytic (Operational Data Stores, Data Warehouses, or Data Lake) solutions to support Online Analytic Processing (OLAP) workloads.
With Snowflake Unistore, both of these workloads can be seamlessly managed on a single cloud platform.
Snowflake has now delivered a single unified platform (Unistore), which supports both transaction and analytic workloads at a massive scale.
What problem are we trying to solve?
It may not be obvious, but the transactional workloads have wildly different characteristics to analytic processing and these are summarised below:
Transactional systems typically handle very short, fast updates to single rows with transaction completion averaging 50 milliseconds, most analytic queries process billions of rows, taking seconds to minutes to complete.
The diagram below illustrates the typical workload profile of a transactional system. It shows a reasonably repeatable pattern with a higher rate of transactions during the working day with a dip around lunchtime until the evening batch processes start.
Compare the above workload with a typical analytic workload in which there's no pattern to usage at all aside from a wild swing as huge analytic queries are submitted and completed.
The diagram below illustrates the resulting system architecture in which operational systems are kept on entirely separate hardware platforms from analytic queries to avoid contention between these hugely different workloads.
On the left, you have the operational (transactional) systems that run the business, while on the right, the data warehouse (analytical) systems support business intelligence, ad hoc reporting, and data scientists.
While this architecture has dominated the IT landscape for 40 years, it does have several significant drawbacks, including:
Complexity: Because transactional and analytical platforms are separated, we need to integrate multiple disparate technologies from different vendors.
Extract, Transform, and Load: To produce useful insights from analytic systems, we need to perform periodic extracts, loads, and data transformations. These data pipelines are often complex, fragile, and expensive to maintain.
Latency: Despite the best efforts to produce near real-time data loading and automated pipelines, an in-built latency is involved in extracting and loading data to the analytics platform.
Data Silos: These are the result of workload separation, which often leads to further complexity in blending operational and analytic data.
Difficulty closing the loop: There's an increasing need to close the loop between operational and analytic solutions. For example, retailers can use analytic results to identify cross-sell and up-sell opportunities based upon machine learning models. Still, operational systems need to act quickly upon these. This becomes increasingly difficult with independently deployed solutions.
To address some of these concerns, some architects implemented an Operational Data Store (ODS) between the transactional and analytic platforms, and while this partly addresses the latency issue, it adds further to system complexity and overall cost.
Want to learn more from a Snowflake Expert? Click on the image below for training from Analytics Today.
The Snowflake Solution - Unistore
Snowflake was initially built to deliver a scalable solution to analytic workloads, but this capability has now been extended to transactional workloads with Unistore.
Hybrid tables deliver an entirely new database processing engine built from the ground up to support both transactional and analytic workloads at scale. These build upon the existing Snowflake framework and include additional features, including:
Low latency lookups: The ability to execute sub-second single-row fetches on using a unique key
Primary Key Constraints: Primary key constraints are enforced
Foreign Key Constraints: To enforce referential integrity
Unique and Non-Unique Indexes: To provide alternate access to data
Row Level Locking: Support for transactions with row level locking and Read Committed isolation level
The SQL code below illustrates the code needed to create a hybrid table:
create hybrid table customer (
customer_id integer primary key,
full_name varchar(255),
email varchar(255),
customer_info variant
);
Of course, because it’s Snowflake you will also get a huge list of additional features including:
Cloning - Not yet available
Unistore and Hybrid Tables Explained
Snowflake Training by an Expert.
Click in the image below for more detail
A hybrid analytic and transactional workload
The ability to host transactional and analytic workloads on a single database platform would by itself be a significant achievement, but Snowflake has gone further. As data is loaded into Snowflake hybrid tables, the data is stored to support fast single-row transactions. However, the same data is also stored in columnar format which can be up to 50 times faster at analytic queries.
This approach is often referred to as Hybrid Transactional and Analytical Processing (HTAP) and was first described in 2009 by Dr. Hasso Plattner is illustrated in the diagram below:
The underlying Unistore technology is based upon the ability to manage both workloads, so data written using the transaction processing engine is also available via the analytic engine. This means the same platform can be used to support both workloads without the challenges associated with Extract, Transform and Load operations.
What do customers think?
Adobe has been using Hybrid Tables in support of it’s cross-channel marketing campaign. They said:
"Running Adobe Campaign on Snowflake has enabled us to offer unparalleled speed and scale to our customers, who can now leverage our best-in-class cross-channel campaign management functionality with performance that can't be matched," - Nick Hall. (Head of Adobe Campaign).
A new target architecture
With the new Snowflake Unistore technology, transactional systems can be deployed directly upon Snowflake without the need to extract and load to a separate platform. This leads to a huge architectural simplification and avoids the need to support multiple platforms and skill sets. Furthermore, data loaded into Snowflake hybrid tables is seamlessly available for analysis which avoids the need to support both a transactional and operational data stores or an ODS.
The diagram below illustrates the new target architecture.
The Benefits of using Unistore and Hybrid Tables
The benefits of using Hybrid Tables include
A single, consistently enforced set of data governance and security controls, including Tagging and Dynamic Masking.
All the existing Snowflake benefits include Time Travel. Additional features, including Zero Copy Clones and Data Sharing, will be added over time.
Cloud performance at scale, including the ability to dynamically resize the warehouse and scale up to larger workloads.
Finally, hybrid Tables also deliver excellent performance for analytic-type queries from a columnar data store without the need to extract data from a transactional system and load and process it on an analytics platform. This means whenever Snowflake executes a query, it uses either the row store or column store, depending on which is most efficient.
Conclusion
For the past 40 years, database workloads have been separately deployed on transactional or analytic platforms to avoid contention and balance the needs of these two opposed solutions. Finally with Snowflake Unistore and hybrid tables, we can achieve transactional query performance in the milliseconds range with concurrency of up to 10,000 queries per second on the same analytic platform used to query billions of rows.
Clearly, this is a seismic shift in database industry and will help cement Snowflake's leadership in delivering innovative solutions.
Snowflake Training from an Expert
Click on the image below for more information.