InfiniSynapse Practitioner Guide

SQL for Data Analysis in 2026: Patterns, Worked Examples, and Where AI Agents Fit

A working analyst's guide to the SQL patterns that actually move the needle — aggregation, joins, CTEs, window functions — plus worked examples on a sample schema and an honest read on where AI agents change the workflow versus replace it.

AuthorInfiniSynapse Research, applied analytics team
Published2026-06-28 · Last verified 2026-06-28 · Next review 2026-09-28
Evidence basePostgreSQL and MySQL official documentation, BIRD and Spider benchmarks, Anthropic agent research, NIST AI RMF, and worked examples on a public-style analytics schema.
Disclosure: This page is published by InfiniSynapse, an enterprise AI data analyst that runs SQL across PostgreSQL, MySQL, Snowflake, Supabase, S3, and CSV sources. The SQL patterns and worked examples are written to be useful even if you never use our product — including for teams evaluating us against other options.
TL;DR

Direct answer: what SQL do analysts actually use?

Most analyst SQL reduces to four pattern families: aggregation with GROUP BY and HAVING, joins across fact and dimension tables, common table expressions for readable multi-step logic, and window functions for ranks, running totals, and period comparisons. Fluency in those four on a real schema covers the majority of daily requests — and frames where AI agents augment the workflow without replacing analyst judgment.

Why SQL is still the analyst's primary tool in 2026

Three forces keep SQL central to data analysis. First, the analytical warehouses analysts touch every day — PostgreSQL, MySQL, Snowflake, BigQuery, Databricks — all expose a SQL interface as the contract. Second, SQL is declarative: you describe the result you want, the engine plans the execution. That separation survived a generation of "SQL is dead" claims because writing readable queries scales better than writing imperative pipelines for analytical work. Third, every AI assistant that touches a database generates SQL under the hood. Reading the generated query is how an analyst confirms the answer is real.

The companion AI database query pillar guide covers the broader shift in how questions are asked of a database. This page stays on the SQL itself — what an analyst actually types, where each pattern earns its keep, and where an agent steps in to draft.

SQL for data analysis framework — four query pattern families (aggregation, joins, CTEs, window functions) feeding an AI agent that plans, runs, and verifies SQL

A sample analytics schema for the examples

The worked examples below use a small e-commerce-style schema. Four tables, joinable by clear keys, broad enough to cover the patterns without becoming a textbook:

TableGrainKey columnsTypical analysis use
customersOne row per customercustomer_id, signup_date, regionCohorts, retention, segmentation
ordersOne row per orderorder_id, customer_id, order_date, total_amountRevenue, AOV, repeat purchase
order_itemsOne row per item in an orderorder_id, product_id, quantity, unit_priceMix, category share, cross-sell
productsOne row per productproduct_id, category, costMargin, category rollups

Two important properties of this schema show up in real datasets: order_items fans out from orders (one-to-many), and customers sometimes have no orders (left-join territory). Both will bite an unprepared analyst on the third example.

The four query pattern families

Pattern 1 — Aggregation with GROUP BY and HAVING

This is the workhorse. Roll up rows, summarize numeric columns, filter aggregates. The shape every analyst writes ten times a day:

SELECT region, COUNT(*) AS customers, SUM(total_amount) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date >= DATE '2026-01-01'
GROUP BY region
HAVING SUM(total_amount) > 100000
ORDER BY revenue DESC;

Two things to remember. WHERE filters rows before aggregation. HAVING filters groups after. Mix them up and you either lose rows you wanted or compute over rows you should have dropped.

Pattern 2 — Joins across fact and dimension tables

Analytics joins are usually between a fact table (orders, events) and one or more dimension tables (customers, products). Three rules cover most cases:

-- Customers who signed up but never placed an order
SELECT c.customer_id, c.signup_date
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;

Pattern 3 — Common table expressions (CTEs) for layered logic

The moment an analytic question has two transformation stages, reach for a CTE. They name intermediate results, document intent, and survive code review better than nested subqueries.

WITH monthly_revenue AS (
  SELECT DATE_TRUNC('month', order_date) AS month,
         customer_id,
         SUM(total_amount) AS rev
  FROM orders
  GROUP BY 1, 2
),
ranked AS (
  SELECT month, customer_id, rev,
         RANK() OVER (PARTITION BY month ORDER BY rev DESC) AS r
  FROM monthly_revenue
)
SELECT month, customer_id, rev
FROM ranked
WHERE r <= 10
ORDER BY month, r;

That query — top 10 customers by month — is also the first useful piece of evidence for picking a CTE over a subquery. Read the version with the CTE, then imagine the subquery version. The CTE wins on every dimension that matters to a reviewer.

Pattern 4 — Window functions for ranks, running totals, and period comparisons

Window functions look at a group of rows without collapsing them. Five shapes cover most needs:

Window functionShape it answersExample
RANK() / DENSE_RANK()Top-N per groupTop 5 products per region
ROW_NUMBER()First / latest per partitionFirst order per customer
SUM() OVER (... ORDER BY ...)Running totalCumulative revenue by month
LAG() / LEAD()Compare to previous / next rowMonth-over-month change
NTILE(n)Bucket into n equal groupsCustomer revenue deciles
-- Month-over-month revenue change per region
SELECT region, month, revenue,
       revenue - LAG(revenue) OVER (PARTITION BY region ORDER BY month) AS mom_change
FROM region_monthly_revenue;

Four worked examples on the sample schema

Example A — Monthly revenue by signup cohort

WITH cohorts AS (
  SELECT customer_id, DATE_TRUNC('month', signup_date) AS cohort_month
  FROM customers
)
SELECT c.cohort_month,
       DATE_TRUNC('month', o.order_date) AS order_month,
       SUM(o.total_amount) AS revenue
FROM cohorts c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY 1, 2
ORDER BY 1, 2;

That output feeds a triangle-shaped cohort chart with months on both axes. The analyst's judgment call is choosing the cohort grain (signup month vs week vs quarter) and the value (revenue vs orders vs items).

Example B — Top 5 products per region

WITH product_region_rev AS (
  SELECT c.region, p.product_id, p.category,
         SUM(oi.quantity * oi.unit_price) AS rev
  FROM customers c
  JOIN orders o ON o.customer_id = c.customer_id
  JOIN order_items oi ON oi.order_id = o.order_id
  JOIN products p ON p.product_id = oi.product_id
  GROUP BY 1, 2, 3
)
SELECT region, product_id, category, rev
FROM (
  SELECT region, product_id, category, rev,
         RANK() OVER (PARTITION BY region ORDER BY rev DESC) AS r
  FROM product_region_rev
) ranked
WHERE r <= 5
ORDER BY region, r;

Example C — Retention by signup month

WITH first_order AS (
  SELECT customer_id, MIN(order_date) AS first_order_date
  FROM orders GROUP BY customer_id
),
joined AS (
  SELECT c.customer_id,
         DATE_TRUNC('month', c.signup_date) AS cohort_month,
         f.first_order_date,
         CASE WHEN f.first_order_date IS NULL THEN 0 ELSE 1 END AS converted
  FROM customers c
  LEFT JOIN first_order f ON f.customer_id = c.customer_id
)
SELECT cohort_month,
       COUNT(*) AS signups,
       SUM(converted) AS converted,
       ROUND(100.0 * SUM(converted) / COUNT(*), 2) AS conversion_pct
FROM joined
GROUP BY cohort_month
ORDER BY cohort_month;

The LEFT JOIN is the load-bearing detail. Without it, customers who never ordered drop out of the COUNT and your conversion percentage inflates silently.

Example D — Same-store growth using LAG

WITH region_monthly AS (
  SELECT c.region,
         DATE_TRUNC('month', o.order_date) AS month,
         SUM(o.total_amount) AS revenue
  FROM customers c
  JOIN orders o ON o.customer_id = c.customer_id
  GROUP BY 1, 2
)
SELECT region, month, revenue,
       LAG(revenue, 12) OVER (PARTITION BY region ORDER BY month) AS revenue_yoy,
       ROUND(100.0 * (revenue - LAG(revenue, 12) OVER (PARTITION BY region ORDER BY month))
                   / NULLIF(LAG(revenue, 12) OVER (PARTITION BY region ORDER BY month), 0), 2)
         AS yoy_pct
FROM region_monthly
ORDER BY region, month;

NULLIF guards against divide-by-zero in the first 12 months where there is no prior-year value. Junior queries often forget that and crash the dashboard the day the chart renders the first month.

Dialect differences that matter for SQL analysis

TopicPostgreSQLMySQLSnowflake / BigQuery
Date truncationDATE_TRUNC('month', x)DATE_FORMAT(x, '%Y-%m-01')DATE_TRUNC('month', x) / DATE_TRUNC(x, MONTH)
String concata || bCONCAT(a, b)a || b / CONCAT(a, b)
Top-N per groupWindow + filterWindow + filter (8.0+)Window + QUALIFY
JSON pathj ->> 'key'JSON_EXTRACT(j, '$.key')j:key / JSON_VALUE
Boolean typeNativeTinyint(1)Native

For deeper warehouse-specific patterns see the PostgreSQL data analysis tools guide, the MySQL data analysis tools guide, and the Snowflake data analyst guide. Each one walks the dialect-specific edges.

Where AI agents change the SQL workflow

AI data agents do not erase SQL — they restructure the workflow around it. On the BIRD benchmark, human engineers reach 92.96% execution accuracy and large models trail by a wide margin without a retrieval and verification loop. That gap is why an agent is structured as a system that dynamically directs its own processes and tool usage, not a one-shot text-to-SQL prompt.

What the analyst actually does day-to-day shifts in three places:

PhaseWithout an AI agentWith an AI agent
DraftingType the join, the GROUP BY, the window function from scratchDescribe the question; agent drafts a plan and the SQL
ReviewingSelf-review and a peer review of the SQLReview the plan, the join keys, and the SQL the agent generated
VerifyingSpot-check the result; ad hoc sanity queriesThe agent runs verification queries automatically; analyst challenges anomalies

The AI data analyst explained page goes deeper on the role split. The AI data analyst job description page describes how teams hire for the new shape.

4
Query pattern families that cover most analyst SQL work: aggregation, joins, CTEs, window functions.
92.96%
Human engineer execution accuracy on the BIRD text-to-SQL benchmark — the bar agents are still working toward without retrieval and verification. Source: BIRD
3
Phases of the analyst workflow that shift with an AI agent in the loop: drafting, reviewing, verifying.

What an analyst still owns end-to-end

Three things stay with the human and will stay with the human for the next planning cycle: framing the question, choosing the grain (which counts as a customer, which orders count, which timezone), and defending the number. An agent can draft the SQL that answers "monthly active users by region" in seconds; an analyst still decides whether MAU means a logged-in session, an order placed, or an event fired.

The knowledge base binding moat, briefly

The newest differentiator inside the AI category is what InfiniSynapse calls database + knowledge base binding. Each connection is paired with a curated knowledge base of business definitions — what active customer means, which status codes represent a paid order, which keys roll up to which business cluster. The agent retrieves from the bound knowledge base as a tool call before drafting SQL, so the generated query is grounded in the team's actual definitions, not the model's prior.

Common SQL mistakes on real datasets

A practical SQL-plus-AI workflow for 2026

  1. Frame the question in plain English. Define the metric, the time range, the segmentation. Write it down before any SQL.
  2. Ask the agent for a draft plan. Read the join keys it picked, the tables it touched, the assumptions it made about your business definitions.
  3. Approve or correct the plan. If a join key is wrong, push back before any SQL runs. Plan mode exists for this exact moment.
  4. Execute under a read-only role. Scoped grants protect production from a mistaken UPDATE the agent should never have generated in the first place.
  5. Review the SQL the agent ran. Not the answer — the SQL. Confirm WHERE clauses, GROUP BY columns, and any window partitions.
  6. Demand an evidence trail. Plan + SQL + result + verification queries + source list. If any of those is missing, the number is not defensible.

SQL fluency stops being about typing speed and starts being about review judgment. Read the query, defend the join key, own the number.

When this guide applies

  • You are an analyst sharpening day-to-day SQL pattern fluency
  • You are deciding which SQL patterns to teach a new hire first
  • You want a fair read on where AI agents fit a SQL-first workflow

When it does not

  • You need a SQL syntax reference — read the official docs
  • You are tuning query plans on a single warehouse — read the engine guide
  • You have not picked a database yet — see the tool category guides instead

See SQL drafted, executed, and verified by an AI data analyst

Connect a PostgreSQL, MySQL, Snowflake, or Supabase database read-only, seed a small knowledge base of business definitions, and ask one open-ended question. Watch the plan, the SQL, and the verification step before deciding whether agentic SQL belongs in your stack.

Try InfiniSynapse online

FAQ

What SQL skills matter most for data analysis?
Four families cover the majority of analyst work: filtering and aggregation with GROUP BY and HAVING, joins across fact and dimension tables, common table expressions for layered logic, and window functions for ranking, running totals, and period comparisons. If you are fluent in those four families on a real schema, you can handle most analytics requests without ever touching a stored procedure.
Should I learn SQL if AI can write SQL for me?
Yes. AI agents draft SQL faster than a human, but the analyst still owns the question, the schema interpretation, and the verification of the result. Without SQL fluency you cannot read the agent plan, cannot spot a wrong join key, and cannot defend a number to a finance reviewer. Treat AI as a drafting partner and a second pair of eyes, not as a replacement for SQL literacy.
What is the difference between a CTE and a subquery for analysis?
A subquery is inlined inside another query and is harder to read once nesting passes two levels. A common table expression names a result set and lets later steps reference it by name, which makes multi-step analysis readable and testable. For analytics work the rule of thumb is to reach for CTEs as soon as you need more than one transformation stage.
When should I use a window function instead of a GROUP BY?
Use GROUP BY when you want to collapse rows into one row per group. Use a window function when you want to keep every row but attach a calculation that looks across a group, such as a running total, a rank within a partition, or the previous row value. The clue is the word per: per customer running revenue, rank per region, change from previous quarter.
How do AI data agents change the SQL workflow for analysts?
An AI data agent retrieves business context and schema, drafts a query plan, runs SQL, verifies the output, and returns an evidence trail. The analyst shifts from typing every join to reviewing plans, approving execution, and challenging anomalous numbers. This raises the throughput on routine asks and frees the analyst to invest time in framing harder questions.
What are common SQL mistakes analysts make on real datasets?
The top three are: joining on keys that fan out and double-count rows, applying WHERE filters that silently drop NULLs you cared about, and grouping on a non-deterministic column when the dialect tolerates it. Each one produces a number that looks credible and is wrong. A review checklist plus a small set of saved sanity queries catches most of them before the dashboard ships.
Do dialect differences matter for SQL analysis?
Yes for date arithmetic, string functions, and array handling, but less for the core analytical patterns. Aggregation, joins, CTEs, and window functions work across PostgreSQL, MySQL, Snowflake, BigQuery, and most warehouses with only syntax tweaks. Picking a primary dialect, then learning a small list of differences when you switch, is more efficient than memorizing every variation up front.
How does InfiniSynapse use SQL behind the scenes?
InfiniSynapse is an enterprise AI data analyst that retrieves business context from a bound knowledge base, plans the analysis, generates SQL through its InfiniSQL component, executes against the connected source, and returns an answer with the plan, the query, and a verification step. The SQL is reviewable in the trace, which is the audit posture regulated teams require.

Methodology and review notes

Last updated: 2026-06-28 · Next scheduled review: 2026-09-28

The patterns and worked examples on this page are grounded in PostgreSQL and MySQL official documentation, public benchmarks (BIRD, Spider), Anthropic's published agent research, the NIST AI Risk Management Framework, and field experience running SQL across real analytics schemas. Examples were tested against PostgreSQL 16 and Snowflake; MySQL and BigQuery equivalents are noted where syntax diverges.

Conflict of interest: InfiniSynapse publishes this guide and sells an enterprise AI data analyst that generates SQL. To reduce bias, the page emphasizes patterns and worked examples first, treats AI agents as a workflow shift rather than a SQL replacement, and links to external benchmarks and standards for every numeric claim.

Update cadence: Reviewed every 90 days for syntax accuracy, dialect changes, benchmark figures, and schema consistency.

Sources and references

  1. [Vendor] The PostgreSQL Global Development Group. PostgreSQL documentation. postgresql.org/docs.
  2. [Vendor] Oracle / MySQL. MySQL Reference Manual. dev.mysql.com/doc.
  3. [Vendor] Snowflake. SQL reference. docs.snowflake.com.
  4. [Independent] BIRD-SQL: A Big Bench for Large-Scale Database Grounded Text-to-SQL Evaluation. BIRD benchmark.
  5. [Independent] Yu et al. Spider: A large-scale human-labeled dataset for text-to-SQL. Spider benchmark.
  6. [Vendor] Anthropic. Building Effective Agents. anthropic.com/research/building-effective-agents.
  7. [Independent] Yao et al. ReAct: Synergizing Reasoning and Acting in Language Models. arxiv.org/abs/2210.03629.
  8. [Standard] NIST. AI Risk Management Framework (AI RMF 1.0, 2023). nist.gov/itl/ai-risk-management-framework.

Related guides