Monitor Snowflake Usage & Cost

·

7 min read

Monitor Snowflake Usage & Cost

One of the most frequently asked questions is how to monitor Snowflake usage and costs. This article briefly summarizes the charges on the Snowflake platform and explains how to monitor Snowflake usage and, therefore, spending over time.

Snowflake Virtual Warehouse Sizes

Unlike other cloud-based analytic platforms, which charge by the hour, Snowflake uniquely charges per second for the compute resources called Virtual Warehouses.

A Virtual Warehouse consists of a cluster of machines (CPUs, memory, and SSD) arranged as a Massively Parallel Processing (MPP) server with between 1 and 128 nodes. These are placed in a series of “T-shirt” sizes, with charges billed as “credits,” costing from as little as $3.00 per hour as illustrated below:

Warehouse SizeServersVirtual CPUsCost per Hour
XSMALL18$3
SMALL216$6
MEDIUM432$12
LARGE864$24
XLARGE16128$48
X2LARGE32256$96
X3LARGE64512$192
X4LARGE1281,024$384
X5LARGEDouble CapacityDouble Capacity$768
X6LARGEDouble CapacityDouble Capacity$1,536
💡
Note: The cost per hour varies by geographic location and cloud provider, but approximately $3 per hour is a reasonable estimate. Also, the X5LARGE and X6LARGE warehouses don't have double the servers but deliver twice the power of each. Snowflake doesn't publish internal hardware details.

It’s therefore relatively easy to estimate the cost of running a server by simply taking:

  • Warehouse Size: A Medium warehouse has four nodes and enough power to summarize and report on gigabytes of data.

  • Hours per Day: The server is expected to run for at least 8 hours per working day. Note that a virtual warehouse will automatically suspend and resume when not being used to avoid unnecessary charges.

This should give a reasonable cost estimate, and the size can be dynamically adjusted depending on the expected data volumes and performance required.

Controlling Spend

Other usage should be relatively small by comparison and include the cost of storage, a simple pass-through charge from the cloud provider. This is typically less than $25 per terabyte per month, and data is automatically compressed at a ratio of 4-10 times, giving additional savings.

Finally, sophisticated features, including near-real-time data ingestion using Snowpipe, automatic data clustering, and materialized view refreshes, use internal Snowflake resources. They are charged per second per CPU core and may add an additional element.

However, you should typically expect around 80% of the spending to be on virtual warehouses, which should be the initial focus of analysis.

Custom Monitoring

Snowflake provides two locations to monitor usage:

  • Information Schema: This provides a set of system-defined views and metadata with real-time statistics about queries. This data is typically retained for up to 14 days and is most helpful in monitoring ongoing queries.

  • Snowflake Account Usage: Provides a rich set of historical metadata for up to a year and will be the primary source of usage analysis.

These areas provide a rich set of metadata that can be analyzed. However, it’s much easier to show the potential results below.

Snowflake Cost: Warehouse Credits Over Time

The diagram below shows the monthly warehouse credits and the cumulative cost during the year.

Snowflake Warehouse Usage Over Time

-- Warehouse Credits over time

select to_char(start_time,'YYYY-MM') as month
,     sum(credits_used)
from snowflake.account_usage.warehouse_metering_history wmh 
where wmh.start_time >= dateadd(month, -12, current_date())
group by to_char(start_time,'YYYY-MM') 
order by 1;

The above query can be used to monitor the monthly spending by all warehouses on the account over the past 12 months and indicate growth over time.

Snowflake Cost: Credit by Type

In addition to warehouse costs (which typically amount to 80% of total spending), Snowflake provides several serverless features. These use internal Snowflake compute resources and are billed per second.

The graph below shows the breakdown by month for each type.

Monthly Credits by Type

-- Monthly Credits By Type

select to_char(usage_date,'YYYYMM') as month 
,      sum(decode(service_type, 
                  'WAREHOUSE_METERING', credits_billed)) as warehouse_credits,
       sum(decode(service_type,
                 'PIPE', credits_billed)) as pipe_credits,
       sum(decode(service_type,
                 'MATERIALIZED_VIEW', credits_billed)) as mview_credits,
       sum(decode(service_type,
                 'AUTO_CLUSTERING', credits_billed)) as clustering_credits,
       sum(decode(service_type,
                 'WAREHOUSE_METERING_READER', credits_billed)) as reader_credits
,      sum(credits_billed) as total
from snowflake.account_usage.metering_daily_history wmh
where wmh.usage_date >= dateadd(month, -12, current_date());

The above query shows a breakdown of the total monthly credits, indicating the type of spend. This includes:


Want to learn more from a Snowflake Expert? Click on the image below for training from Analytics Today.

Snowflake Training from Analytics Today


Snowflake Cost: Credits by Warehouse

When trying to understand warehouse spending, it's sensible to start with this graph, which shows the total expenditure by warehouse. This can be adapted to limit spending to the current month or year.

Snowflake Usage - Credits used by Warehouse

select warehouse_name, sum(credits_used) as credits_used
from snowflake.account_usage.warehouse_metering_history wmh 
group by warehouse_name
order by 2 desc;

Snowflake Cost: Credits by Hour

The graph below indicates the time of day queries were executed and credits spent. This may be useful for indicating peak times during the day and identifying unexpected high spending during a quiet time overnight.

Credits Used by Hour

-- Credits by hour of the day

select to_char(start_time,'HH24') as hour
,      sum(credits_used)
from snowflake.account_usage.warehouse_metering_history wmh 
where wmh.start_time >= dateadd(month, -1, current_date())
group by to_char(start_time,'HH24') 
order by 1;

Notably, any warehouse running 24x7 is potentially expensive and relatively unusual. While some use cases always need to compute resources, you should verify that these are being used efficiently. If, for example, you find more than one warehouse of the same size running, there may be an opportunity to consolidate workloads.

Data Storage Cost by Month

Storage costs are a relatively small part of the overall Snowflake cost. However, tuning opportunities are often easy to identify, and I usually find Snowflake customers spending thousands of dollars on unnecessary storage.

Data Storage over Time

-- Data Storage by Month and Type

select to_char(usage_date,'YYYYMM') as sort_month 
,      to_char(usage_date,'Mon-YYYY') as month 
,      avg(storage_bytes) as storage
,      avg(stage_bytes) as stage
,      avg(failsafe_bytes) as failsafe
from snowflake.account_usage.storage_usage
group by month, sort_month
order by sort_month;

The report above shows the monthly cost for storage broken by type. These indicate the cost by:

  • Storage: This shows the data stored in Snowflake tables.

  • Stages: This indicates the data stored in files in Snowflake stages

  • Failsafe: This shows the storage in the 7-day failsafe area to protect against accidental data loss.

The graph below illustrates a typical situation I've frequently seen. In this case, the customer loaded data into a Snowflake internal stage before using a COPY command to load the data warehouse. However, they didn't remove the old data files, which led to the situation below.

Another common mistake is loading data from a stage into a staging table, which is deleted before the next load. This leads to very high volumes of Failsafe Storage as, by default, Snowflake holds data for 7 days in Failsafe. This is easily fixed by converting the staging tables to TRANSIENT.

My article outlines more ways to cut Snowflake Data Storage Costs by avoiding issues with zero copy clones.

Snowflake Cost Monitoring: Free Dashboards

There are also several freely available dashboards for the major Business Intelligence vendors below:

If you know of another vendor or an alternative dashboard, please get in touch with me on LinkedIn or comment below.