Before you read anything: make a call

The task: return all products that have never been ordered.

The schema has two tables: products (id, name) and order_items (order_id, product_id).

The sample data has 3 products, all of which have at least one order. Both queries below return an empty result set on the test data. But only one is correct in the general case.

Query A:

SELECT p.id, p.name
FROM products p
LEFT JOIN order_items oi ON p.id = oi.product_id
WHERE oi.product_id IS NULL;

Query B:

SELECT p.id, p.name
FROM products p
WHERE p.id NOT IN (SELECT product_id FROM order_items);

Both return the same output on the sample data. Which one is correct, and what condition causes the other to fail?

See the answer

Query A is correct. Query B is wrong, and it will fail silently in production.

NOT IN with a subquery breaks when the subquery returns any NULL values. If any row in order_items has a NULL product_id (which happens in real data with optional foreign keys or incomplete imports), then NOT IN (...) evaluates to NULL (unknown) for every product. The entire query returns zero rows, even for products that genuinely have no orders.

The sample data has no NULL product_id values, so the bug is invisible during testing. On clean test data, Query B looks fine. On production data, it silently drops valid results.

Query A uses the LEFT JOIN ... WHERE IS NULL pattern: the standard safe approach for “rows in A with no match in B.” It explicitly checks for the absence of a match rather than relying on subquery behavior that NULL can corrupt.

This is the hardest class of SQL bug to catch: a query that executes without error, returns plausible results on test data, and produces wrong results on real data. Your job as an annotator is to reason about what a query does, not just whether it runs.

SQL evaluation is about semantics, not syntax

Most SQL annotation mistakes come from treating queries the same way a linter does: checking whether they’ll execute. That’s the wrong frame.

AI-generated SQL frequently executes without error and returns the wrong answer. A query that returns 847 rows when the correct answer is 1,000 rows produces no error message. A LEFT JOIN silently replaced with INNER JOIN will drop rows and say nothing about it. NOT IN with a subquery will return zero results if the subquery contains a NULL, and the only way to catch it is to know how NULL behaves in SQL.

Your job is to read a query and reason about what it will return, including what it gets wrong quietly.

Fundamentals you must know without reference

SELECT, WHERE, GROUP BY, HAVING

The most common AI errors happen at the GROUP BY/HAVING boundary. The rule:

WHERE filters rows before aggregation
HAVING filters after aggregation, on aggregate results like COUNT, SUM, AVG

Common AI error:

-- WRONG: aggregate function in WHERE clause
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE COUNT(*) > 10
GROUP BY department;

-- CORRECT: use HAVING for post-aggregation filters
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;

The first query raises a syntax error in most databases. The AI generates it anyway. You need to spot it, and more importantly, explain why it’s wrong in your rationale.

JOIN types and their semantics

JOIN type	Returns
INNER JOIN	Only rows with matches in both tables
LEFT JOIN	All rows from left table; NULLs for unmatched right rows
RIGHT JOIN	All rows from right table; NULLs for unmatched left rows
FULL OUTER JOIN	All rows from both tables; NULLs where no match
CROSS JOIN	Cartesian product: every combination of rows

AI models frequently use INNER JOIN when the task semantics require LEFT JOIN. The symptom: the query silently drops rows for customers, orders, or users that have no matching record in the joined table. If the task says “show all customers and their order counts,” customers with no orders should appear with a count of 0. That requires LEFT JOIN, not INNER JOIN.

NULL semantics

NULL in SQL means unknown. It does not mean zero, not empty string, not false. This has consequences that catch annotators constantly.

Any comparison with NULL returns NULL:

NULL = 5    -- NULL (not FALSE)
NULL != 5   -- NULL (not TRUE)
NULL = NULL -- NULL (not TRUE)

To test for NULL, you must use IS NULL or IS NOT NULL.

The WHERE clause implication: WHERE status != 'cancelled' will silently exclude rows where status is NULL, because NULL != 'cancelled' evaluates to NULL, not TRUE. If those rows should be included, the correct filter is:

WHERE status != 'cancelled' OR status IS NULL

This is one of the most common silent correctness errors in AI-generated SQL.

Try It: INNER JOIN vs. LEFT JOIN

The task: “Write a SQL query that returns each customer’s name and the total number of orders they have placed. Include customers who have placed no orders — show 0 for them.”

The AI generates this query against a customers table and an orders table, where orders.customer_id references customers.id:

SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;

Is this query correct? If not, what is the specific error and how should it be fixed?

See answer

The query is incorrect. INNER JOIN will silently exclude any customer who has no orders, because INNER JOIN only returns rows where a match exists in both tables. Customers with no orders have no rows in orders, so they are dropped from the result entirely.

The task explicitly requires including customers with zero orders. No error message will appear; the query simply returns fewer rows than required. This is a semantic correctness error.

Correct query:

SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;

LEFT JOIN preserves all rows from the left table (customers). Where no matching order exists, o.id is NULL, and COUNT(o.id) correctly returns 0 because COUNT ignores NULLs.

One subtlety worth noting in your rationale: COUNT(*) would return 1 for unmatched rows, not 0. COUNT(o.id) is correct here because it counts non-NULL values only. If the AI used COUNT(*) instead, that’s a second error on top of the JOIN type.

Window functions

Window functions are a common source of AI errors and a reliable signal of query sophistication:

SELECT
  name,
  department,
  salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

Key distinctions between the three ranking functions:

RANK(): tied rows get the same rank; next rank is skipped (1, 1, 3)
DENSE_RANK(): tied rows get the same rank; no ranks skipped (1, 1, 2)
ROW_NUMBER(): always unique sequential numbers regardless of ties (1, 2, 3)

An AI that uses RANK() when the task requires unique row numbering (or DENSE_RANK() when the task requires gap-skipping) produces a semantically wrong result even though the query executes without error.

Try It: WHERE COUNT(*) — what's wrong?

A task asks for “all departments with more than 5 employees.” The AI writes:

SELECT department, COUNT(*) AS headcount
FROM employees
WHERE COUNT(*) > 5
GROUP BY department;

Explain precisely what is wrong with this query and write the corrected version.

See answer

The error: WHERE COUNT(*) > 5 is invalid SQL. The WHERE clause filters individual rows before GROUP BY runs. Aggregate functions like COUNT() don’t exist yet at the point WHERE is evaluated. Using an aggregate in WHERE causes a syntax error in virtually all SQL dialects.

The fix:

SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;

HAVING filters after aggregation, which is where aggregate functions are valid.

Why the rationale matters: A rationale that says “this query has a syntax error” is less useful than one that says “aggregate functions cannot appear in WHERE clauses because WHERE operates before GROUP BY; HAVING is the correct clause for post-aggregation filters.” The first tells a reviewer the query breaks. The second tells the model why it breaks and what the correct pattern is. This is what training data is for.

Schema reading

When evaluating AI SQL, read the schema before reading the query. Common schema-related errors:

Wrong table: The AI joins orders to products when it should join order_items to products, because orders doesn’t have a direct product foreign key, it goes through order_items.

Hallucinated column names: AI generates customer_name when the schema has first_name and last_name as separate columns. The query references a column that doesn’t exist.

Aggregation without grouping: SELECT department, AVG(salary) FROM employees without GROUP BY department is invalid in most SQL dialects, or returns a single row with an arbitrary department value.

Reading the schema takes 30 seconds. It catches a class of errors that no amount of SQL knowledge will help you spot without it.

Data analysis patterns: Python and pandas

Data science tasks often pair SQL with Python pandas code. The evaluation principles are the same — reason about what the code will actually return, not just whether it runs.

groupby with aggregation:

# Common AI mistake: not resetting the index
result = df.groupby('department')['salary'].mean()
# Returns a Series with department as index, not a clean DataFrame

# Correct for most downstream use:
result = df.groupby('department')['salary'].mean().reset_index()

Merging DataFrames: The pandas equivalent of SQL JOINs. pd.merge(df1, df2, on='id', how='left'): the how parameter mirrors SQL JOIN semantics. An AI that defaults to how='inner' (the pandas default) when the task requires preserving all rows from one table is making the same semantic error as the INNER JOIN case above.

Handling missing data: df.dropna() versus df.fillna(0) have very different effects on aggregate statistics. An AI that silently drops NaN rows before computing a mean may produce a misleading result. The task might require acknowledging that data is missing rather than removing it.

Pandas is covered here at the level you’ll encounter in most annotation tasks. If a task requires deeper pandas work (complex transformations, time series, multi-index operations), treat it as Specialist-tier and evaluate accordingly.

Common SQL evaluation mistakes by annotators

Accepting a query that looks right. SQL that executes and returns rows is not necessarily correct. Always reason about what the query should return and whether the approach achieves it.

Ignoring NULL handling. NULL semantics are non-obvious and AI models get them wrong frequently. Check any WHERE clause involving inequality or NOT IN for potential NULL issues.

Not reading the schema. An AI may write a perfectly syntactic query against a different table structure than the one provided. Check column names, foreign key paths, and table relationships before evaluating the query logic.

Accepting the sample output as proof. As the Entry Simulation showed: a query can return the correct result on clean test data and produce wrong results on real data. Sample output is a starting point, not a verdict.

Quick Reference

The silent failure pattern: AI SQL errors that produce no error message and return plausible-but-wrong results are the hardest to catch. INNER JOIN silently drops rows, NOT IN silently returns nothing when NULLs are present, WHERE on an aggregate raises a syntax error, but only the last one is visible without reasoning through the query.
NULL is unknown, not zero: Any comparison with NULL returns NULL, not TRUE or FALSE. WHERE col != 'x' excludes NULL rows. NOT IN with a subquery returns zero results if the subquery contains any NULL. Always check NULL handling in inequality filters and subqueries.
Read the schema before the query: Column names, foreign key paths, and table structure must match the query. A syntactically valid query that references the wrong table or a hallucinated column name is a correctness failure, and you won’t catch it without reading the schema first.

SQL & Data Handling

SQL evaluation is about semantics, not syntax

Fundamentals you must know without reference

SELECT, WHERE, GROUP BY, HAVING

JOIN types and their semantics

NULL semantics

Window functions

Schema reading

Data analysis patterns: Python and pandas

Common SQL evaluation mistakes by annotators

Quick Reference

Test Your Knowledge

1. What is the key difference between WHERE and HAVING in a SQL query?

2. Which JOIN type returns all rows from the left table, including rows with no matching rows in the right table?

3. What does the RANK() window function return when two rows have the same value?

4. What is the result of evaluating NULL != 5 in SQL?

5. Which window function assigns sequential integers with no gaps, even when rows tie?

6. An AI generates a query using INNER JOIN between customers and orders, but the task requires returning all customers including those with no orders. This is best classified as what?

7. How does GROUP BY treat NULL values when grouping rows?

8. Which of the following SQL snippets correctly filters to only departments with more than 10 employees?

Sign in to see your results

Results

How did this quiz feel?

Was this worth your time?

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have