Tuning Snowflake Query performance is often a top priority for data engineers. In this article, I’ll describe how an entirely new breed of artificial intelligence Agents and AI Teammates can collaborate with data engineers to proactively identify query performance issues, help maximize query performance, and control or even reduce Snowflake spending.
Artificial Intelligence - in Plain English
Before describing an AI Teammate, let’s define some jargon. A lot of confusing nonsense on the web assumes we’re all Artificial Intelligence (AI) experts, so let’s first define the basics.
The diagram above provides a conceptual view of Artificial Intelligence, which includes the following sub-domains:
Machine Learning: This uses statistical techniques to analyze data, identify patterns, and improve their performance over time without explicit programming. For example, to help identify that queries seem to run slower at month end.
Deep Learning: Uses multi-layered neural networks (modeled on the human brain) to interpret complex data, recognizing intricate patterns beyond traditional machine learning. For example, it could identify that month-end queries are slower because several month-end batch jobs run concurrently with end-user queries, impacting query performance.
Generative AI: Uses large language models (LLMs) to generate human-like dialogue, write summaries, and compose creative text based on questions (aka prompts). This could be used, for example, to generate an explanation complete with supporting statistics to help data engineers identify poorly written SQL leading to performance issues.
In summary, Artificial Intelligence refers to various solutions that simulate human intelligence. Abilities include analyzing and learning from data, recognizing patterns, making decisions, understanding language, and solving complex problems.
The underlying techniques include Machine Learning (ML), which uses algorithms to scan historical data to identify patterns and make predictions based on historical data. ML models improve accuracy when exposed to more (better quality) data and improve over time without explicit programming.
“More data beats clever algorithms, but better data beats more data” - Peter Norvig (Google Research).
Deep Learning is a subset of ML that uses an interconnected, multi-layer network inspired by the human brain. This enables it to tackle complex tasks, including image recognition, natural language processing, and speech-to-text translation.
Generative AI serves as the human interface to Artificial Intelligence. Whereas ML and Deep Learning might focus on specific tasks like classification (is this query part of a long-running batch routine or an end-user analysis), Gen AI extends these capabilities to support modeling more complex patterns. This means it’s capable of more sophisticated analysis and can produce (I.E., generate) a text explanation of a given insight.
For example, identify the month-end batch jobs that run during working hours on a Virtual Warehouse that are too small and negatively impact end users’ query performance. It may also recommend moving the long-running jobs (a set of sequentially executed queries) to a larger warehouse and monitoring the performance and throughput with follow-up recommendations.
Does this sound like science fiction? Read on to find out more.
What is a Snowflake AI Agent?
In Snowflake terms, an AI Agent is an artificially intelligent service designed to handle a specific task. For example, Snowflake Cortex Analyst generates SQL text from natural language requests. Unlike an AI Teammate, an AI Agent has a narrower task focus.
For example, Cortex Analyst includes several AI agents, including:
Question Classification Agent: To verify whether a question can be answered and suggest alternatives to avoid ambiguity. For example, “What’s the most valuable product line we sell?” is potentially ambiguous as it could refer to the most expensive product or the one that generates the largest profit.
SQL Generation Agent: Converts the natural language request into Snowflake SQL.
Error Correction Agent: This agent checks the generated SQL for syntactic or semantic errors, correcting them where needed to avoid AI hallucinations.
Each agent operates autonomously yet contributes to a coordinated process, allowing the Cortex Analyst to generate precise, actionable insights from natural language queries without human intervention.
What is a Snowflake AI Teammate?
During the past seven years, I’ve written thousands of hand-crafted SQL statements to analyze the query metrics in the Snowflake ACCOUNT_USAGE views, including countless queries on QUERY_HISTORY, to understand customer workloads and understand and gain insights into why customers struggle with performance and cost on Snowflake.
The problem is that it’s like trying to find a needle in a field full of haystacks. It’s incredibly time-consuming and requires years of experience to ask the right question - let alone write the necessary SQL.
Finally, when you find a helpful insight (like identifying an inefficient cluster key that should be removed), it is challenging to get the System Administrator to accept the analysis and make the necessary changes.
Imagine that instead of a highly paid Snowflake consultant, you could deploy an Artificially Intelligent expert to query your system autonomously, automatically identify performance or cost issues, recommend remedial action and (assuming it’s authorized), even make system configuration changes.
Does this sound like science fiction? You may be surprised just how close we are.
Unlike the AI Agent, which takes on a specific responsibility, an AI Teammate performs the role of an expert Snowflake advisor. Features include analyzing query performance, suggesting cost optimization changes, and providing recommendations, real-time insights, and alerts.
How do AI Agents and AI Teammates Work Together?
The diagram below provides a conceptual view of AI Teammates and Agents and how they build upon the artificial intelligence capabilities.
It shows how AI Agents provide specialized services to monitor or autonomously adjust the Snowflake configuration as workload intensity varies over time.
By contrast, the AI Teammates perform the more high-level tasks, including analyzing query performance or virtual warehouse workload and suggesting cost or performance optimizations.
AI Teammates collaborate directly with Data Engineers and Administrators, monitoring, analyzing, and alerting users of potential issues and recommending remedial action. AI Agents take on more direct responsibilities, including automating virtual warehouse configuration changes and capturing the metadata needed to identify possible performance issues.
How can AI Agents and AI Teammates help improve Snowflake Query Performance?
I’ve written extensively about techniques to maximize Snowflake query performance, potentially using Query Acceleration Service and Snowflake Cluster Keys. I’ve even prepared a Snowflake Best Practices training course to help promote these best practices.
However, it still takes considerable effort and experience to fully understand best practices and how to get the most out of Snowflake, which is why AI Teammates can help by working alongside Data Engineers and providing expert advice and support.
Features supported can include:
Query Optimization Advice: An AI Agent can quickly scan SQL queries with 1000’s of lines and highlight potential issues such as wrapping database columns in functions that impact micro-partition pruning.
Virtual Warehouse Deployment: One of the most frequently asked questions I receive is, “What warehouse size should I run this query on?”. An AI Teammate could proactively monitor query workloads and identify queries (or entire batch jobs) better executed on a larger or smaller warehouse. In addition to improving throughput for large tasks, moving workloads to a smaller warehouse can help reduce Snowflake spend.
Dynamic Table Configuration: AI Agents can proactively monitor Snowflake Dynamic Tables to adjust the TARGET_LAG and Warehouse Size to ensure the optimum balance of performance and cost. This could mean (for example) refreshing on a larger warehouse when appropriate or reducing the
TARGET_LAG
during off-peak times to reduce Snowflake spend. In one case, I helped a customer in Munich, Germany, reduce their Snowflake spend by 50% by suspending a real-time pipeline during out-of-hours.Proactive Alerting: Although Snowflake supports spend tracking using budgets and a simple query timeout feature, an AI Agent could continuously scan QUERY_HISTORY to identify execution time anomalies. For example, I worked with one customer, and a query started at 6 pm on a Friday afternoon by a system administrator performed a cartesian join running on an X4LARGE warehouse, leading to a $20,000 bill. An AI Agent could help identify the difference between a high-risk end-user query and a predefined batch routine that took longer to run because of an increased data load.
Improving Data Load Efficiency: An AI Teammate could monitor Snowflake data loading metrics and check against data loading best practices. This could recommend (for example) splitting huge data files into smaller 100MB chunks on a larger virtual warehouse size to maximize throughput or loading smaller loads on an XSMALL warehouse to reduce spending.
Are Snowflake AI Agents Science Fiction?
“The future is already here, it's just not evenly distributed” - William Gibson.
You may be surprised to find that AI agents and AI teammates are already available and help thousands of Snowflake Data Engineers improve efficiency and effectiveness. For example, the Power User for DBT extension delivers AI-driven documentation. Analyzing the database metadata and column lineage it generates detailed descriptions of dbt models.
Altimate can even reduce thousands of lines of SQL code into a simple English explanation to help Data Engineers understand complex code and diagnose performance issues. For example, the following text was generated by analyzing a 5,000-line SQL statement.
The query is aggregating and calculating various metrics
related to sales, revenue, profit, and distribution cost
for different customers and stores.
It also includes various conditions and filters such as the
customer type being either 'HIGH' or 'VERY HIGH', the
sale stage being 'ACCEPTED', the completed status
being true, and the year of the product date available
being 2020 or later.
The screenshot below illustrates how An AI agent constantly monitors Virtual Warehouse workloads to identify potential cost savings and automatically adjusts warehouse configuration to eliminate waste.
The screenshot below demonstrates how an AI Agent can help System Administrators identify potential security flaws. In this case, the administrator requests the list of users with the ACCOUNTADMIN role.
Conclusion
The pessimistic view of AI is that it will put IT specialists out of a job. However, it’s more likely to make existing data engineers more effective by providing an artificially intelligent Snowflake expert to work alongside them, providing expertise and advice based on Snowflake best practices.
The surprising thing is, however, that it’s not science fiction. We’re already close to building fully intelligent AI teammates and AI Agents are already in production.
“Steve Jobs famously said ‘A computer is a bicycle for the mind’. Artificial Intelligence is a motorbike”