Why use Claude for SQL instead of ChatGPT or Copilot?

Claude wins on three specific axes. First, the 200K token context window lets you paste entire schema dumps (50-200 tables with full DDL, sample rows, and column comments) into a single conversation, so every query Claude writes is grounded in your actual schema instead of guessed column names. Second, Claude's reasoning is materially better on multi-CTE queries with window functions, complex joins, and recursive logic; the model holds the analytical intent across 100+ lines of SQL without losing track of the grouping or filter logic. Third, Artifacts gives you an editable code panel where Claude iterates on the query in place rather than rewriting from scratch every turn, which makes debugging sessions 3 to 5x faster. ChatGPT is competitive on simple lookups; Copilot is better when you live inside a database GUI. For analytical SQL beyond 30 lines, Claude is the practical choice in 2026.

Which version of Claude is best for SQL queries?

Claude Opus 4.6 or Sonnet 4.6 (claude-opus-4-6 and claude-sonnet-4-6) on the Pro or Team tier. Opus wins on the hardest analytical queries: cohort analysis, attribution windows, recursive hierarchies, gap-and-island problems, and anything requiring window functions chained across multiple CTEs. Sonnet 4.6 is faster and roughly 90% as accurate on the same tasks at lower cost; for the daily 30 to 60 SQL questions an analyst runs, Sonnet 4.6 is the right default. Haiku 4.5 (claude-haiku-4-5) is fine for syntax help and simple translations but underperforms on multi-step reasoning. Use Projects to load your schema once and inherit it across every conversation; the time saved over a single quarter exceeds the Pro subscription cost.

How do I load my database schema into Claude effectively?

The pattern that works: dump your information_schema (or equivalent) into a single text file with the structure of CREATE TABLE statements, primary and foreign key relationships, column comments where they exist, and 3 to 5 sample rows per table for context on data shape. Save this as schema.sql or schema.md and load it into a Claude Project once; every conversation in that Project inherits the schema as background context. For a 200-table warehouse this file typically lands at 80,000 to 150,000 tokens, well inside the 200K window. Include explicit notes on naming conventions (singular vs plural, snake_case vs camelCase), join patterns Claude should default to, business definitions of fuzzy terms like "active user" or "completed order," and tables to avoid (legacy, deprecated, or PII-restricted). The schema file is the single highest-leverage upstream investment for SQL with Claude.

Can Claude write SQL for Snowflake, BigQuery, Postgres, and other dialects?

Yes, and dialect translation is one of Claude's strongest SQL applications. Always tell Claude the dialect explicitly in the first message: "This is Snowflake SQL" or "This is BigQuery Standard SQL." Claude handles the differences in date functions (DATE_TRUNC vs TIMESTAMP_TRUNC), array functions (FLATTEN vs UNNEST), window function syntax, identifier quoting, and dialect-specific features (Snowflake's QUALIFY, BigQuery's STRUCT, Postgres's LATERAL joins). For dialect translation, give Claude the source query plus the target dialect; it will produce the equivalent and call out any features that do not translate cleanly (such as Snowflake-specific JavaScript UDFs targeting a Postgres-only environment). The error rate on dialect translation is low when you provide clear source and target labels.

How do I get Claude to write performant SQL, not just correct SQL?

Correctness and performance are two different prompts. After Claude produces a working query, ask it explicitly: "Now optimize this for performance on a [size] table partitioned by [column] with [index strategy]." Claude will rewrite for partition pruning, push filters into CTEs, replace correlated subqueries with joins, swap DISTINCT for GROUP BY where appropriate, and reorder joins by selectivity. For warehouses like Snowflake or BigQuery, ask Claude to estimate the query's cost in bytes scanned and identify the partitions or clustering keys it should hit. Claude is strong at spotting common performance anti-patterns (full table scans, repeated subqueries, OR conditions across indexed columns) but cannot see your actual query plan; for production-critical queries, run EXPLAIN and paste the output back into Claude for a second pass.

How do I debug a broken SQL query with Claude?

The debugging workflow that works: paste the query, the exact error message (with line and column numbers if your warehouse provides them), the expected behavior, and the actual behavior. If the query runs but returns wrong results, paste 5 to 10 sample input rows and the expected vs actual output. Ask Claude to: identify the root cause, explain why the bug happens, propose the fix, and produce a corrected version with inline comments on the changes. For subtle bugs (NULL handling, duplicate rows from join fanout, time-zone mismatches, off-by-one in window frames), Claude will often spot the issue faster than re-reading your own code. For really hairy bugs, ask Claude to walk through the query CTE by CTE and explain what each step produces; the act of explanation usually surfaces where the logic diverges from your intent.

Can Claude analyze a query plan or EXPLAIN output?

Yes, and this is where Claude meaningfully outperforms simpler SQL assistants. Paste the EXPLAIN or EXPLAIN ANALYZE output (Postgres, MySQL, Snowflake, BigQuery all give different formats) along with the query and the table sizes. Claude will identify the slowest steps, explain why they are slow (full scans, hash spills, broadcast joins gone wrong, missing pushdowns), and recommend specific changes. For Snowflake, ask Claude to interpret the query profile JSON; it handles the nested structure well. For BigQuery, paste the execution graph stage-by-stage. The reasoning models pick up on subtle issues like rows estimated incorrectly leading to a bad join strategy, or filters not being pushed past aggregations. Always verify Claude's recommendations by running EXPLAIN again after applying the fix.

How do I prevent Claude from hallucinating column names or table names?

Hallucinated identifiers are the single most common failure mode of LLMs on SQL. Three disciplines prevent it. First, always load the schema (DDL plus column comments plus sample rows) into the conversation or Project before asking for a query. Second, when the question references a metric (active users, MRR, completed orders), define the metric in business terms and let Claude ask which tables to use rather than assuming. Third, for the final query, paste the EXPLAIN or a dry-run output back to Claude and ask it to verify every table and column exists in the schema you provided. The combination of upfront schema loading and post-hoc verification catches almost all hallucinations. Without these disciplines, the hallucination rate on a 50-table warehouse is around 15 to 25% on first-draft queries.

How does Claude handle window functions and complex aggregations?

Window functions are where Claude particularly shines compared to simpler models. Claude correctly handles ROWS BETWEEN vs RANGE BETWEEN, partitioning across multiple keys, ordering with NULLS FIRST/LAST, percentile functions (PERCENTILE_CONT, PERCENTILE_DISC), QUALIFY clauses in Snowflake, and chained window functions across multiple CTEs. For cohort analysis, retention curves, gap-and-island problems, and running totals with reset conditions, Claude produces working SQL on the first try far more often than simpler models. The pattern that works: describe the analytical question in business terms ("weekly retention curve for users who signed up in March 2026, by acquisition channel, with cohort sizes"), tell Claude the dialect, and let it draft the window-function structure. Then verify by running the query against a known-good test dataset before pointing it at production.

Can Claude write dbt models, including refs and macros?

Yes. Tell Claude the project structure (sources, staging, intermediate, marts) and the dbt version. Claude correctly uses ref() and source() syntax, generates {{ config() }} blocks with materialized strategies, writes Jinja2 macros for repeatable logic, and produces YAML test definitions. For incremental models, Claude handles is_incremental() guards and unique_key strategies competently. For complex macros (dynamic column generation, custom materializations, cross-database compatibility), ask Claude to walk through the macro logic step by step before generating the final code; macros are an area where small bugs hide easily. Always paste your dbt_project.yml into the conversation so Claude inherits naming conventions, default materializations, and target databases.

How do I use Claude for ad-hoc analysis vs production query development?

Different workflows. For ad-hoc analysis, work in a single Claude conversation with the schema loaded in a Project; iterate quickly, accept queries that are correct for the immediate question, and do not over-optimize. The goal is the answer, not the perfect query. For production query development (queries that ship to a dashboard, dbt model, or recurring report), follow a four-stage process: (1) Claude drafts the query, (2) you run it against a small test dataset and verify results, (3) Claude optimizes for performance based on EXPLAIN output and table sizes, (4) Claude adds inline comments explaining the logic and edge cases for the next analyst. The four-stage process takes 20 to 40 minutes per query but produces SQL that holds up in production for years.

What about row-level security, PII, and data privacy when using Claude?

Never paste actual production data with PII into a public LLM. The right pattern: paste schema and DDL freely (these are not PII), use synthetic or anonymized sample rows for context (5 to 10 fake rows showing data shape), and run any generated SQL against your warehouse yourself. For organizations with stricter data policies, Claude is available through AWS Bedrock and Google Vertex AI with enterprise data agreements; check with your security team on which deployment your organization has approved. The Claude Pro and Team tiers have data-handling commitments that exceed the free tier; the Enterprise tier adds audit logs and custom retention policies. For anything touching customer data, always check your company's AI policy before pasting; a 30-second check prevents a meaningful incident.

Can Claude help me migrate from one warehouse to another (e.g., Redshift to Snowflake)?

Yes, and warehouse migration is one of the highest-ROI Claude SQL applications. Load both the source DDL and target dialect notes into a Claude Project. For each query, table, view, or stored procedure, paste it into Claude with "convert this from Redshift to Snowflake" and the model produces the equivalent with notes on any features that do not translate (UDFs, custom functions, partition strategies). For the schema itself, Claude generates target-dialect CREATE TABLE statements that preserve column types, defaults, constraints, and comments. The full migration of a 100-table warehouse typically takes 1 to 3 weeks of analyst time with Claude assistance, versus 2 to 4 months without. Always test every converted object against a representative dataset before cutting over production traffic.

How do I keep Claude's SQL output consistent with my team's style guide?

Build a style guide as a markdown file and load it into your Claude Project. Cover: keyword case (uppercase SQL keywords vs lowercase), CTE naming conventions, indentation depth, where commas go (leading vs trailing), how to format long IN lists, when to use UNION ALL vs UNION, alias conventions for tables and columns, comment style, and how to format multi-line queries. The style guide file plus the schema file plus a few example "good" queries from your codebase is enough for Claude to inherit your house style across every conversation. Without the style guide, Claude defaults to a reasonable neutral SQL style that may not match your team's conventions; with it, the output reads like a senior member of your team wrote it.

What ChatGPT-related mistake hurts SQL output the fastest with Claude?

Asking Claude to write SQL without loading the schema first. The model will produce plausible-looking SQL with column names that do not exist, join keys that do not match, and assumptions about data types that fail at runtime. The 60-second investment of pasting your DDL (or loading a schema.sql into a Project once) eliminates 80% of the errors that frustrate analysts working with LLMs on SQL. The second-fastest mistake is treating Claude as a one-shot query generator instead of a collaborator: paste the schema, ask the question, run the query, paste the result back, and iterate. The 4-turn cycle produces production-grade SQL; the 1-turn cycle produces SQL you cannot trust. The third mistake is not specifying the dialect explicitly; even small dialect differences (DATE_TRUNC arguments, NULL handling) waste hours when you assume Claude guessed correctly.

GPTPROMPTS.AI

HOW-TO GUIDE · 2026

How to Use Claude for SQL Queries: 2026 Guide

An 8-step workflow for analysts and engineers. Load your full schema into a Claude Project, write multi-CTE queries with window functions on the first try, debug by walking through CTEs, and optimize for warehouse cost.

Key Takeaway

Updated May 2026

Using Claude for SQL queries in 2026 requires an 8-step workflow built around Claude's 200K token context window: load your full schema (DDL, sample rows, business definitions) into a Claude Project once, declare the dialect explicitly in every conversation, describe analytical intent rather than dictating query structure, iterate inside Artifacts for multi-turn refinement, and use the four-turn cycle of draft, run, paste-results, fix to produce production-grade SQL. Claude Opus 4.6 outperforms ChatGPT on multi-CTE analytical queries, window functions, dialect translation, and EXPLAIN output interpretation. Schema loading eliminates 80 percent of column-name hallucinations on a 50-table warehouse.

Best for: Multi-CTE queries, window functions, dialect translation, query optimization, debugging
Skill level: Data analysts, analytics engineers, BI developers, data engineers, founders writing their own SQL
Recommended version: Claude Opus 4.6 (claude-opus-4-6) for hardest queries; Sonnet 4.6 for daily work
Context window: 200K tokens, fits an 80,000-150,000 token schema dump for a 200-table warehouse
Time to first working query: 5-15 minutes including schema load, far faster on subsequent queries in same Project
Critical complement: EXPLAIN ANALYZE output paste-back for performance optimization passes

SQL with Claude in 2026 is a different category of useful than SQL with ChatGPT or Copilot. The difference is the 200K token context window. Claude can hold an entire production warehouse schema in working memory: 200 tables of CREATE TABLE statements, primary and foreign key relationships, column comments, sample rows, naming conventions, and business definitions of fuzzy terms like active user or completed order. Every query Claude writes is grounded in that schema, not guessed from training data, and the column-name hallucination rate that frustrates analysts on smaller models drops to near zero.

The 8-step workflow below is built for production analytical work: cohort analysis, retention curves, attribution windows, gap-and-island problems, recursive hierarchies, dbt models that ship to dashboards. The first three steps are upstream investments (load schema, declare dialect, describe analytical intent) that pay back inside the first week. The middle steps (Artifacts iteration, four-turn cycle, performance optimization) are how you turn a draft into production SQL. The final two steps (CTE-by-CTE debugging, documentation) are what separate throwaway queries from queries that hold up in production for years. Every step has tool-specific patterns that lean on Claude's strengths rather than fighting the model.

01 Load your schema into a Claude Project 02 Always declare the SQL dialect in the first message 03 Describe the analytical intent, not the query structure 04 Iterate in Artifacts for multi-step query development 05 Run the query, paste results back, and iterate against real data 06 Optimize for performance after correctness is locked in 07 Debug broken queries by walking through CTE by CTE 08 Document the final query for the next analyst

Who this guide is for

• Data analysts at SaaS, e-commerce, fintech, or marketplace companies who write 30 to 60 SQL queries a week against a warehouse with 50+ tables
• Analytics engineers building and maintaining dbt models, dimensional layers, or semantic models that ship to BI tools
• BI developers writing queries for Looker, Tableau, Power BI, or Mode dashboards where queries need to be both correct and performant
• Data engineers who write SQL alongside Python and need help with complex window functions, recursive CTEs, or warehouse migration
• Founders and operators at early-stage startups doing their own analytics against a Postgres or BigQuery instance without a dedicated data team
• Database administrators handling query optimization, index tuning, or migration projects between warehouses (Redshift to Snowflake, on-prem to cloud)

Why Claude specifically (vs. ChatGPT, Copilot, or Gemini)

For SQL work, Claude has four specific advantages over alternatives. First, the 200K token context window is the single biggest technical differentiator. A full schema dump for a 200-table warehouse lands at 80,000 to 150,000 tokens, well inside Claude's window. ChatGPT's 128K context fits a similar load but Claude's needle-in-haystack recall on long context is materially better, which matters when Claude needs to find the right join key 90,000 tokens deep in your DDL. Second, Projects let you load the schema, style guide, and naming conventions once and inherit them across every conversation; the time saved over a single quarter exceeds the Pro subscription cost by 10x. Third, Artifacts gives you an editable code panel where Claude updates the query in place across turns rather than rewriting from scratch every message; this cuts query development time by 50 to 70% on multi-turn debugging sessions. Fourth, Claude's reasoning on multi-CTE queries with window functions is consistently stronger than competitors, especially on cohort analysis, retention curves, gap-and-island problems, and recursive hierarchies.

Where Claude loses: Microsoft Copilot wins when your work lives inside an Excel-bound database GUI or SQL Server Management Studio. ChatGPT's Code Interpreter is better for one-off CSV analysis where you do not have warehouse access. Gemini integrates natively with BigQuery and Google Sheets if your stack is fully Google. GitHub Copilot is better for in-IDE autocomplete inside dbt projects. The realistic answer for an analyst is to use Claude as the primary SQL collaborator (especially for analytical work) and reach for the dialect-native or environment-native tool when the task is a fit.

The 8 steps below are tuned for Claude but the underlying logic translates to any major LLM with a long context window. The patterns that matter (schema loading, dialect declaration, four-turn cycle, CTE-by-CTE debugging) are model-agnostic; the specific UX advantages (Projects, Artifacts) are Claude-specific in 2026. For paired workflows, see our Claude for coding guide and the general how to use Claude guide.

The 8-Step Workflow

Load your schema into a Claude Project

The single highest-leverage upstream activity is loading your full schema into a Claude Project once. Dump your information_schema (or equivalent) into a markdown or SQL file containing CREATE TABLE statements, primary and foreign key relationships, column comments, 3 to 5 sample rows per important table for data-shape context, naming convention notes, and business definitions of fuzzy terms (active user, completed order, MRR). For a 200-table warehouse this file lands at 80,000 to 150,000 tokens, well inside Claude's 200K context window. Save the file in a Claude Project, and every conversation in that Project inherits the schema as background context without re-pasting. The setup takes 30 to 60 minutes once and pays back inside the first week. Without this step, Claude will hallucinate column names and join keys 15 to 25% of the time on a real warehouse.

Example prompt

In a new Claude Project, upload schema.md with this structure: '# Warehouse Schema\n\n## Conventions\nSingular table names, snake_case columns, all timestamps stored as UTC. Primary keys named id, foreign keys named [parent_table]_id.\n\n## Business definitions\nactive_user = user with at least one event in last 30 days. completed_order = orders.status IN (paid, shipped, delivered).\n\n## Tables\n\n### users\nCREATE TABLE users (id BIGINT PRIMARY KEY, email TEXT, created_at TIMESTAMP, ...);\nSample rows: [3-5 anonymized rows]\n\n### orders\n[same structure]\n\n[continue for all tables]'. Then in any new conversation in this Project: 'Using the schema in this Project, write a query to...' Every Claude response now grounds in your actual schema.

Always declare the SQL dialect in the first message

Dialect drift is the second most common source of broken Claude SQL output. Even small differences (DATE_TRUNC argument order, NULL handling in window functions, identifier quoting, array functions) waste analyst time when you assume Claude guessed right. The discipline: open every conversation with an explicit dialect declaration as the first sentence. Snowflake, BigQuery Standard SQL, PostgreSQL 16, MySQL 8, Redshift, Databricks SQL, DuckDB, MS SQL Server. Include the version number where it matters; PostgreSQL 16's MERGE syntax differs from earlier versions, and Snowflake QUALIFY behavior shifts across releases. For multi-warehouse environments (a Postgres operational store plus a Snowflake warehouse), specify which dialect for each query. Claude's dialect handling is excellent when you tell it explicitly; it degrades sharply when it has to guess.

Example prompt

'Dialect: Snowflake (current Standard release). Schema is loaded in this Project. Write a query that returns weekly active users for the last 12 weeks, where active = at least one event in the events table that week. Output columns: week_start (DATE), active_users (NUMBER). Order by week_start ascending. Use DATE_TRUNC for week aggregation; weeks start Monday. Optimize for partition pruning on events.event_date.'

Describe the analytical intent, not the query structure

When asking Claude for a query, describe what you want to know in business terms first, then let Claude propose the structure. The pattern that works: state the question ("weekly retention curve for users who signed up in March 2026, by acquisition channel, with cohort sizes"), the output shape (column names, types, granularity), the dialect, and any optimization constraints (partition keys, clustering, expected row counts). The pattern that fails: trying to dictate the CTE structure or join order yourself; you spend more time describing the query than writing it. Claude's reasoning produces better-structured SQL than most analysts when given the analytical intent and a clean schema. The exception is when you have a known-good pattern you want to extend; in that case, paste the existing query and ask Claude to extend it.

Example prompt

'Dialect: BigQuery Standard SQL. Schema in Project. Question: weekly retention curves for users who completed first purchase in March 2026, sliced by acquisition_channel. Output one row per (cohort_week, weeks_since_signup, acquisition_channel) with columns: cohort_week (DATE), weeks_since_signup (INT), acquisition_channel (STRING), cohort_size (INT64), active_users_in_week (INT64), retention_pct (FLOAT64). Active = at least one purchase event that week. Cover weeks 0 through 12. Order by cohort_week ASC, acquisition_channel ASC, weeks_since_signup ASC. Optimize for partition pruning on events.event_date. Add inline comments on each CTE.'

Iterate in Artifacts for multi-step query development

For queries that need more than one round of refinement (which is most production-grade SQL), ask Claude to put the query in an Artifact. Artifacts gives you an editable code panel where Claude updates the query in place across turns instead of rewriting from scratch every message. This is materially faster for debugging, optimization, and structural rewrites because you and Claude share a single source of truth for the current query state. The pattern: first turn produces the draft in an Artifact; subsequent turns request specific changes ("add a filter for paid users only", "swap the correlated subquery for a join", "add NULL handling to the divide"); Claude updates the Artifact with diff-style summaries. For a complex 100-line query, Artifacts cuts the development time by 50 to 70% compared to chat-only iteration.

Example prompt

'Put this query in an Artifact so we can iterate. [paste the draft from step 3 if you have one, or ask Claude to draft fresh]. After the first version is in the Artifact, I will ask for specific changes: filters to add, joins to refactor, performance tweaks based on EXPLAIN output. Update the Artifact in place each time, and after each update give a 2-line summary of what changed and why.'

Run the query, paste results back, and iterate against real data

The four-turn cycle that produces production SQL: (1) Claude drafts the query, (2) you run it against your warehouse, (3) you paste the result (or the error, or the unexpected count) back to Claude, (4) Claude iterates. This loop catches the bugs that schema loading alone misses: subtle NULL handling, time-zone mismatches, join fanout that doubles row counts, off-by-one in window frames, business-logic edge cases. Always paste 5 to 20 sample output rows back to Claude rather than just the row count; the model spots issues in the data shape that count alone hides. For aggregation queries, paste both the aggregate result and a few rows of the underlying data so Claude can verify the math. The four-turn cycle takes 10 to 30 minutes per query and produces SQL that holds up in production for years.

Example prompt

'I ran the query from the Artifact and got these results: [paste 10 sample rows + total row count + any anomalies]. Two things look off: (1) the cohort_size column is twice what I expect for the March 8 cohort, which suggests join fanout somewhere; (2) the retention_pct for week 0 is 87% instead of 100%, which suggests the active-user definition is missing some signup-day events. Walk through the query CTE by CTE, identify where each issue likely originates, and update the Artifact with fixes plus inline comments explaining the logic.'

Optimize for performance after correctness is locked in

Correctness and performance are two different prompts. Once you have a working query that returns the right results on test data, ask Claude to optimize. Provide the table sizes (rough row counts), partition or clustering keys, indexes, and the expected query frequency. For warehouse SQL (Snowflake, BigQuery, Redshift, Databricks), Claude rewrites for partition pruning, pushes filters into CTEs, replaces correlated subqueries with joins, swaps DISTINCT for GROUP BY where appropriate, and reorders joins by selectivity. For OLTP (Postgres, MySQL), Claude focuses on index usage, EXISTS vs IN, and avoiding sequential scans. Always run EXPLAIN ANALYZE (or the warehouse equivalent: Snowflake Query Profile, BigQuery Execution Details, Databricks Spark UI) and paste the output back to Claude for a second optimization pass; Claude often spots issues from the actual plan that the original query alone did not surface.

Example prompt

'The query in the Artifact is now correct on test data. Optimize for production: events table has 8 billion rows, partitioned by event_date (daily), clustered on user_id. Users table is 50 million rows. Query runs every hour from a Looker dashboard. Target: under 30 seconds at p95 and under 100GB scanned per run. Constraints: cannot create new indexes or materialized views this quarter. Rewrite for partition pruning, push filters into CTEs, replace any correlated subqueries with joins, and explain the changes inline. After the rewrite, predict the bytes scanned per run.'

Debug broken queries by walking through CTE by CTE

When a query runs but returns wrong results (the most common bug class with SQL), the debugging workflow that works: paste the query, the expected vs actual results, 10 sample input rows, and ask Claude to walk through the query CTE by CTE explaining what each CTE produces and why. The act of explanation surfaces where the logic diverges from intent. For each CTE, Claude states: input row count, transformation applied, output row count, output schema, and edge cases handled. Mismatches between expected and actual at any CTE pinpoint the bug. For really subtle bugs (NULL handling, duplicate rows from join fanout, time-zone mismatches, off-by-one in window frames), this CTE walkthrough finds the issue 3 to 5x faster than re-reading the query yourself. Always run the fix against the same test data and verify the corrected behavior before shipping.

Example prompt

'This query returns wrong results. Expected: 1,200 active users for the week of March 8, 2026. Actual: 2,400 (exactly double). Query: [paste]. Sample input: 10 rows from events for that week [paste]. Walk through the query CTE by CTE. For each CTE, tell me: (1) input row count, (2) transformation applied, (3) output row count, (4) output schema, (5) any edge cases that could double-count. Identify the most likely CTE causing the doubling, explain why, and update the Artifact with a fix plus a comment explaining the bug.'

Document the final query for the next analyst

The final step that separates throwaway SQL from production SQL: documentation. Ask Claude to add inline comments to every CTE explaining what it does and why, a header block summarizing the query's purpose, the source tables and any assumptions, the expected output shape, edge cases handled, and known limitations. For dbt models, ask Claude to generate the YAML schema definition with column descriptions and tests. For dashboard queries, ask for a 1-paragraph summary suitable for the dashboard's description field. The next analyst (often you, six months later) will spend 20 to 60 minutes re-understanding an undocumented query; the 5 minutes Claude spends documenting it saves that time every time the query is touched. Treat documentation as a non-negotiable part of the production SQL workflow, not an afterthought.

Example prompt

'Add full documentation to the final query in the Artifact. (1) Header block (5 to 10 lines) covering: purpose, source tables and assumptions, output shape, edge cases handled, known limitations, owner, last updated date. (2) Inline comment per CTE (1 to 2 lines) explaining what it does and why. (3) Inline comment on every non-obvious filter or join condition. (4) If this is a dbt model, also generate the schema.yml entry with column descriptions and at least one test per column (not_null, unique, accepted_values). Keep comments terse; avoid restating obvious SQL syntax.'

Common Mistakes That Break Claude SQL Output

1. Asking for SQL without loading the schema first

The single biggest source of broken queries. Claude will produce plausible-looking SQL with column names that do not exist, join keys that do not match, and assumptions about data types that fail at runtime. Load your DDL, sample rows, and business definitions into a Project once and inherit them across every conversation.

2. Not declaring the dialect explicitly

Even small differences (DATE_TRUNC argument order, NULL handling, identifier quoting, array functions) waste hours when you assume Claude guessed correctly. Open every conversation with the dialect as the first sentence: Snowflake, BigQuery Standard SQL, PostgreSQL 16, MySQL 8, Redshift, Databricks SQL.

3. Treating Claude as a one-shot generator instead of a collaborator

The 4-turn cycle (draft, run, paste results, fix) produces production-grade SQL; the 1-turn cycle produces SQL you cannot trust. Always run the query against test data, paste the results back, and let Claude iterate. The cycle takes 10 to 30 minutes per query and produces SQL that holds up for years.

4. Optimizing for performance before correctness is locked in

Get the query returning right results on test data first. Then optimize. Asking Claude to write a fast query before you have verified the logic produces queries that are fast at being wrong. Two passes (correctness, then performance) are reliably better than one combined pass.

5. Skipping EXPLAIN output paste-back

Claude cannot see your actual query plan. For production-critical queries, run EXPLAIN ANALYZE (or warehouse equivalent) and paste the output back to Claude for a second optimization pass. Claude often spots issues from the actual plan (incorrectly estimated rows leading to bad join strategy, filters not pushed past aggregations) that the original query alone did not surface.

6. Pasting production data with PII into the conversation

Never paste actual production data with PII into a public LLM. Paste schema and DDL freely, use synthetic or anonymized sample rows for context (5 to 10 fake rows showing data shape), and run any generated SQL against your warehouse yourself. Check your company AI policy before pasting anything that touches customer data.

7. Trusting comments and column descriptions Claude invented

When Claude lacks column comments in your schema, it will sometimes invent plausible business definitions in its output (e.g., commenting that orders.status = 1 means paid when it actually means draft). Always verify any business-logic assumption Claude states; the model is confident even when wrong on these.

8. Not documenting the final query for the next analyst

Production SQL without comments is a tax on every analyst who touches it later (often you, six months on). Always ask Claude to add a header block, inline CTE comments, and assumptions list. The 5 minutes it takes saves 20 to 60 minutes every time the query is revisited.

Pro Tips (What Most Analysts Miss)

Build a separate Claude Project per warehouse. If your company runs both an operational Postgres and an analytical Snowflake, give each its own Project with its own schema and dialect notes. Cross-warehouse drift is a common source of bugs when the schemas blur in one Project.

Load 5 to 10 known-good queries from your codebase as style examples. Beyond the schema and style guide, paste 5 to 10 real queries your team considers exemplary. Claude inherits not just the syntax style but the structural patterns (how you use CTEs, where you put filter conditions, how you handle NULLs).

Use Opus 4.6 for the hardest queries; Sonnet 4.6 for the daily 30 to 60. Opus is materially better on cohort analysis, attribution windows, recursive hierarchies, and gap-and-island problems. Sonnet is roughly 90% as accurate at faster response times for simpler analytical work. Match the model to the difficulty.

Always paste 10 to 20 sample output rows back to Claude, not just row counts. Claude spots issues in data shape that count alone hides: unexpected NULLs, suspicious clustering, time-zone mismatches, off-by-one in window frames. The act of pasting the rows is what makes the four-turn cycle work.

For dbt projects, paste your dbt_project.yml into the Project. Claude inherits naming conventions, default materializations, target databases, and on-run hooks. Every model Claude generates respects the project conventions instead of producing generic dbt code.

Use Claude for warehouse migration, not just query writing. Loading both source DDL and target dialect notes into a Project, then asking Claude to convert each table, view, and stored procedure systematically, compresses a 2-to-4 month migration to 1-to-3 weeks. The full pattern is in the Claude SQL prompt library below.

For window function bugs, ask Claude to walk through the window frame line by line. Window function bugs (off-by-one in ROWS BETWEEN, wrong partition keys, missing ORDER BY) are the hardest SQL bugs to spot by re-reading. Claude's CTE-by-CTE walkthrough surfaces them in 2 to 5 minutes; manual debugging often takes 30 to 60.

Always re-verify schema accuracy after major warehouse changes. When your team adds or renames tables, drops columns, or restructures the schema, refresh the schema file in your Claude Project. Stale schemas are worse than no schema; they look authoritative but lead Claude to confidently produce broken queries.

Claude SQL Prompt Library (Copy-Paste)

25 production-tested prompts organized by SQL task. Replace bracketed variables with your specifics. Always run prompts inside a Claude Project with your schema file loaded for ground-truth column names.

Schema loading and Project setup

'I am setting up a Claude Project for SQL work against a [Snowflake / BigQuery / Postgres] warehouse. Generate a schema.md template I can fill in with: (1) conventions section (naming, timezone, key patterns); (2) business definitions section (active user, completed order, MRR, churn); (3) tables section with CREATE TABLE statements, sample rows placeholders, and column comments. Format optimized for Claude to inherit as background context.'

'Below is the output of running SHOW TABLES; SHOW COLUMNS; on my Snowflake warehouse: [paste]. Convert this into a structured schema.md file with one section per table containing: CREATE TABLE statement, 3-5 sample anonymized rows, primary and foreign key annotations, column comments where they exist. Group tables by domain (users, orders, events, etc.) and add a table of contents at the top.'

Analytical queries (cohorts, retention, attribution)

'Dialect: Snowflake. Schema in Project. Build a last-touch attribution query: for orders completed in March 2026, attribute each order to the most recent marketing_touch event in the 30 days before purchase. Output: order_id, user_id, order_value, attributed_channel, attributed_campaign, days_from_touch_to_order. Use QUALIFY to pick the latest touch per order. Add a NULL handling case for orders with no touch in the window (label as direct).'

'Dialect: PostgreSQL 16. Schema in Project. Calculate gap-and-island for user_sessions: identify continuous streaks of daily activity per user. Output: user_id, streak_start_date, streak_end_date, streak_length_days. Use ROW_NUMBER() and date arithmetic to identify the islands. Filter to streaks of 7+ days only. Order by streak_length_days DESC.'

Dialect translation

'Convert this query from Redshift to Snowflake. Source query: [paste]. Note any features that do not translate cleanly (UDFs, custom functions, partition strategies). For each non-trivial difference (DATE_TRUNC arguments, LISTAGG vs ARRAY_AGG, window function syntax), explain what changed and why. Output: working Snowflake version with inline comments on the changes.'

'Convert this stored procedure from MS SQL Server to PostgreSQL 16. Source: [paste]. Map T-SQL constructs (variables, control flow, error handling, transactions) to PL/pgSQL equivalents. For features without a direct equivalent (e.g., SELECT INTO behavior, OUTPUT clauses), suggest the closest Postgres pattern and flag the change. Output: working Postgres version plus a 5-line summary of the most significant changes.'

Performance optimization

'The query in the Artifact is correct on test data. Optimize for production. Table sizes: events 8 billion rows partitioned by event_date daily clustered on user_id; users 50 million rows. Query frequency: hourly from a Looker dashboard. Target: under 30 seconds at p95 and under 100GB scanned per run. Constraints: no new indexes or materialized views. Rewrite for partition pruning, push filters into CTEs, replace correlated subqueries with joins. Predict bytes scanned after the rewrite.'

'Here is the EXPLAIN ANALYZE output from Postgres for the query in the Artifact: [paste]. Identify the slowest steps, explain why each is slow (sequential scans, hash spills, bad join order, missing index usage), and recommend specific changes. Update the Artifact with the optimization and predict the new runtime based on the plan.'

'Here is the Snowflake Query Profile JSON for the query in the Artifact: [paste]. Walk through the plan stage by stage. Identify which stage consumes the most time and bytes, explain the root cause (full scan, join broadcast gone wrong, filter not pushed past aggregation), and recommend a specific rewrite. Show the rewrite in the Artifact and predict the new bytes scanned and runtime.'

Debugging wrong results

'This query returns wrong results. Expected: 1,200 active users for the week of March 8, 2026. Actual: 2,400 (exactly double). Query: [paste]. Sample input: 10 rows from events for that week [paste]. Walk through the query CTE by CTE. For each CTE, tell me: input row count, transformation applied, output row count, output schema, edge cases that could double-count. Identify the most likely CTE causing the doubling and update the Artifact with a fix.'

'This query runs but returns NULL for the retention_pct column. Query: [paste]. Sample data showing the NULL [paste]. Identify the root cause (most likely a divide-by-zero or NULL propagation through aggregation), explain why, propose the fix, and update the Artifact with a corrected version that handles the NULL case explicitly.'

dbt models

'Dialect: Snowflake. dbt project structure: sources/staging/intermediate/marts. Build a marts/finance/fct_orders.sql model that joins stg_orders, stg_users, stg_marketing_touches. Use ref() for upstream models. Materialization: incremental, unique_key=order_id, on_schema_change=fail. Include is_incremental() guard so only orders updated since {{ this }}.max_updated_at process. Generate the YAML schema definition with column descriptions and tests (not_null on order_id, unique on order_id, accepted_values on status).'

'Generate a dbt macro called date_spine that produces a continuous date series between two dates for any granularity (day, week, month, quarter). Compatible with Snowflake, BigQuery, and Postgres. Args: start_date, end_date, datepart. Use dispatch() pattern for cross-database compatibility. Add comments explaining each branch.'

Window functions and complex aggregations

'Dialect: BigQuery Standard SQL. Schema in Project. For each user, calculate the running 30-day sum of order_value, ordered by order_date. Use a window function with ROWS BETWEEN INTERVAL 29 DAY PRECEDING AND CURRENT ROW. Output: user_id, order_date, order_value, running_30d_sum. Handle the case where users have no orders in the prior 29 days (running sum = current order value).'

'Dialect: Snowflake. Question: for each support ticket, find the time-to-first-response and time-to-resolution. Tickets table has ticket_id, opened_at, resolved_at. Events table has ticket_id, event_type (response, resolution, reopen), event_at. Use window functions to identify the first response per ticket and handle reopened tickets correctly (reset the resolution clock). Output: ticket_id, time_to_first_response_minutes, time_to_resolution_minutes, was_reopened.'

Migration and refactoring

'Convert this Redshift schema to Snowflake. Source DDL: [paste]. Map column types (Redshift INT4 to Snowflake NUMBER(10,0), Redshift VARCHAR(MAX) to Snowflake VARCHAR(16777216)), preserve constraints, defaults, and comments. Add Snowflake clustering keys based on the most common filter columns I list here: [list]. Output: target Snowflake CREATE TABLE statements plus a migration order based on FK dependencies.'

'Refactor this 300-line query into clean CTEs with single-responsibility per CTE and meaningful names. Original: [paste]. Each CTE should have: (1) descriptive name, (2) inline comment on what it produces, (3) single transformation purpose (filtering, joining, aggregating, ranking). Preserve exact output schema and row count. After the refactor, generate a CTE dependency diagram in mermaid syntax.'

Documentation

'Add full documentation to the final query in the Artifact. (1) Header block (5-10 lines) covering purpose, source tables and assumptions, output shape, edge cases handled, known limitations, owner, last updated date. (2) Inline comment per CTE (1-2 lines) on what it does and why. (3) Inline comment on every non-obvious filter or join condition. (4) If this is a dbt model, generate the schema.yml entry with column descriptions and at least one test per column.'

Want more Claude prompts for technical workflows? See our how to use Claude (full guide), Claude for coding, Claude for research, and Claude for PDF analysis. For comparable data workflows on other tools, see ChatGPT for data analysis and ChatGPT for Excel.

Frequently Asked Questions

Who this guide is for

• Data analysts at SaaS, e-commerce, fintech, or marketplace companies who write 30 to 60 SQL queries a week against a warehouse with 50+ tables

• Analytics engineers building and maintaining dbt models, dimensional layers, or semantic models that ship to BI tools

• BI developers writing queries for Looker, Tableau, Power BI, or Mode dashboards where queries need to be both correct and performant

• Data engineers who write SQL alongside Python and need help with complex window functions, recursive CTEs, or warehouse migration

• Founders and operators at early-stage startups doing their own analytics against a Postgres or BigQuery instance without a dedicated data team

• Database administrators handling query optimization, index tuning, or migration projects between warehouses (Redshift to Snowflake, on-prem to cloud)

Why Claude specifically (vs. ChatGPT, Copilot, or Gemini)

The 8-Step Workflow

Load your schema into a Claude Project

Example prompt

Always declare the SQL dialect in the first message

Example prompt

Describe the analytical intent, not the query structure

Example prompt

Iterate in Artifacts for multi-step query development

Example prompt

Run the query, paste results back, and iterate against real data

Example prompt

Optimize for performance after correctness is locked in

Example prompt

Debug broken queries by walking through CTE by CTE

Example prompt

Document the final query for the next analyst

Example prompt

Common Mistakes That Break Claude SQL Output

1. Asking for SQL without loading the schema first

2. Not declaring the dialect explicitly

3. Treating Claude as a one-shot generator instead of a collaborator

4. Optimizing for performance before correctness is locked in

5. Skipping EXPLAIN output paste-back

6. Pasting production data with PII into the conversation

7. Trusting comments and column descriptions Claude invented

8. Not documenting the final query for the next analyst

Pro Tips (What Most Analysts Miss)

Claude SQL Prompt Library (Copy-Paste)

Schema loading and Project setup

Analytical queries (cohorts, retention, attribution)

Dialect translation

Performance optimization

Debugging wrong results

'This query returns wrong results. Expected: 1,200 active users for the week of March 8, 2026. Actual: 2,400 (exactly double). Query: [paste]. Sample input: 10 rows from events for that week [paste]. Walk through the query CTE by CTE. For each CTE, tell me: input row count, transformation applied, output row count, output schema, edge cases that could double-count. Identify the most likely CTE causing the doubling and update the Artifact with a fix.'

Who this guide is for

Why Claude specifically (vs. ChatGPT, Copilot, or Gemini)

The 8-Step Workflow

Load your schema into a Claude Project

Always declare the SQL dialect in the first message

Describe the analytical intent, not the query structure

Iterate in Artifacts for multi-step query development

Run the query, paste results back, and iterate against real data

Optimize for performance after correctness is locked in

Debug broken queries by walking through CTE by CTE

Document the final query for the next analyst

Common Mistakes That Break Claude SQL Output

1. Asking for SQL without loading the schema first

2. Not declaring the dialect explicitly

3. Treating Claude as a one-shot generator instead of a collaborator

4. Optimizing for performance before correctness is locked in

5. Skipping EXPLAIN output paste-back

6. Pasting production data with PII into the conversation

7. Trusting comments and column descriptions Claude invented

8. Not documenting the final query for the next analyst

Pro Tips (What Most Analysts Miss)

Claude SQL Prompt Library (Copy-Paste)

Schema loading and Project setup

Analytical queries (cohorts, retention, attribution)

Dialect translation

Performance optimization

Debugging wrong results

dbt models

Window functions and complex aggregations

Migration and refactoring

Documentation

Frequently Asked Questions

Why use Claude for SQL instead of ChatGPT or Copilot?

Which version of Claude is best for SQL queries?

How do I load my database schema into Claude effectively?

Can Claude write SQL for Snowflake, BigQuery, Postgres, and other dialects?

How do I get Claude to write performant SQL, not just correct SQL?

How do I debug a broken SQL query with Claude?

Can Claude analyze a query plan or EXPLAIN output?

How do I prevent Claude from hallucinating column names or table names?

How does Claude handle window functions and complex aggregations?

Can Claude write dbt models, including refs and macros?

How do I use Claude for ad-hoc analysis vs production query development?

What about row-level security, PII, and data privacy when using Claude?

Can Claude help me migrate from one warehouse to another (e.g., Redshift to Snowflake)?

How do I keep Claude's SQL output consistent with my team's style guide?

What ChatGPT-related mistake hurts SQL output the fastest with Claude?

Related Guides

Who this guide is for

Why Claude specifically (vs. ChatGPT, Copilot, or Gemini)

The 8-Step Workflow

Load your schema into a Claude Project

Always declare the SQL dialect in the first message

Describe the analytical intent, not the query structure

Iterate in Artifacts for multi-step query development

Run the query, paste results back, and iterate against real data

Optimize for performance after correctness is locked in

Debug broken queries by walking through CTE by CTE

Document the final query for the next analyst

Common Mistakes That Break Claude SQL Output

1. Asking for SQL without loading the schema first

2. Not declaring the dialect explicitly

3. Treating Claude as a one-shot generator instead of a collaborator

4. Optimizing for performance before correctness is locked in

5. Skipping EXPLAIN output paste-back

6. Pasting production data with PII into the conversation

7. Trusting comments and column descriptions Claude invented

8. Not documenting the final query for the next analyst

Pro Tips (What Most Analysts Miss)

Claude SQL Prompt Library (Copy-Paste)

Schema loading and Project setup

Analytical queries (cohorts, retention, attribution)

Dialect translation

Performance optimization

Debugging wrong results

dbt models

Window functions and complex aggregations

Migration and refactoring

Documentation

Frequently Asked Questions