Prerequisites: This module assumes you can read Python code and understand basic programming concepts: variables, loops, functions, conditionals. If you’ve never written Python before, work through any free beginner Python course before coming back. This module won’t teach you to program from scratch. It will teach you to evaluate code the way a professional reviewer does.

Before you read anything: make a call

A user asked an AI: “Write a Python function that returns the second largest number in a list.”

The AI produced this code and this explanation:

def second_largest(nums):
    nums.sort(reverse=True)
    return nums[1]

AI’s explanation: “This function sorts the list in descending order and returns the element at index 1, which is the second largest number.”

Is the AI’s explanation correct? Is the code correct?

See the answer

The explanation is accurate for the happy path. The code has three bugs.

The AI’s explanation correctly describes what the code does when given a well-formed list of distinct numbers with at least two elements. That’s the happy path. Reviewers who only check the happy path miss what gets code rejected.

Bug 1: Empty or single-element list. If nums is [] or [5], then nums[1] raises an IndexError. The function has no guard for this. A task asking for “the second largest” implicitly requires handling the case where no second largest exists.

Bug 2: Duplicate values. If nums = [5, 5, 3], sorting gives [5, 5, 3] and nums[1] returns 5. But the second largest distinct value is 3. Whether this is a bug depends on the task spec, but it’s the kind of ambiguity that must be flagged, not assumed away.

Bug 3: Input mutation. nums.sort() sorts the list in place, modifying the caller’s original list. This is an undocumented side effect. The function should sort a copy: sorted(nums, reverse=True).

A correct implementation:

def second_largest(nums: list) -> int | None:
    unique = sorted(set(nums), reverse=True)
    return unique[1] if len(unique) >= 2 else None

This is what code review annotation looks like. The AI’s prose explanation wasn’t wrong. It accurately described the code’s behavior. But “accurately describes a buggy function” is not a passing rationale.

What coding tasks look like on AI training platforms

Coding annotation work falls into two categories, and they require different skills.

Algorithms tasks: You solve or evaluate algorithmic problems: sorting, searching, dynamic programming, graph traversal. The AI generates a solution, and you verify whether it’s correct, optimal, and handles edge cases. Or you write a reference solution from scratch that the model trains on. These tasks require genuine Python fluency and pay at the Specialist tier.

Code review sessions: You receive AI-generated code and evaluate it across multiple dimensions: correctness, time and space complexity, code style, edge case handling, and sometimes security. Less writing, more reading. These appear across Specialist and Subject Matter Expert tiers depending on the domain.

Both task types require genuine Python fluency. Not tutorial-level familiarity, but the ability to mentally execute code, identify failure modes, and reason about performance.

Algorithmic complexity: what you must know cold

Every algorithms task touches Big O notation. You need to recognize common complexities on sight:

Complexity	Example pattern
O(1)	Hash table lookup, array index access
O(log n)	Binary search, balanced BST operations
O(n)	Single loop over input, linear scan
O(n log n)	Merge sort, heap sort, most efficient comparison sorts
O(n²)	Nested loops, bubble sort, insertion sort (worst case)
O(2ⁿ)	Recursive subset generation, naive Fibonacci

When evaluating AI-generated code, check both time complexity and space complexity. An AI solution that solves a problem in O(n²) when O(n log n) is achievable is not wrong, but it’s suboptimal, which matters at scale and should be noted in your rationale. An AI that implements “find duplicates in a list” using two nested loops gives O(n²) when a hash set achieves O(n) time at the cost of O(n) space. Your annotation should flag the suboptimal complexity and explain the better approach.

Try It: O(n²) vs. O(n): is it wrong?

You’re reviewing two AI implementations of a function that checks whether any two numbers in a list sum to a target value. Both produce correct output on all test cases.

Response A:

def has_pair_sum(nums, target):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] + nums[j] == target:
                return True
    return False

Response B:

def has_pair_sum(nums, target):
    seen = set()
    for num in nums:
        if target - num in seen:
            return True
        seen.add(num)
    return False

Is Response A wrong in a code review context? How should you score it relative to Response B?

See answer

Response A is not wrong, but it is suboptimal and should be flagged.

Response A is O(n²) time due to the nested loops. For small inputs this is unnoticeable, but at scale it degrades significantly. Response B is O(n) time using a hash set. It trades O(n) space for a linear time solution.

In a code review annotation context, correctness and complexity are separate criteria. Response A passes the correctness criterion (produces correct output) but fails the complexity criterion (O(n²) when O(n) is straightforwardly achievable).

Your rationale should: confirm correctness, identify the nested loop pattern and its O(n²) complexity, explain that a hash set approach achieves O(n) by trading O(n) space, and note that Response B demonstrates the preferred approach.

Do not mark Response A as “incorrect” — that misrepresents the evaluation. Mark it as correct but suboptimal, and score it lower on the complexity criterion. These are different rubric dimensions.

Writing clean Python: the standards code reviewers apply

PEP 8 is the official Python style guide and the baseline for every code review rubric. The rules most commonly violated in AI-generated code: 4-space indentation (not tabs), snake_case for variables and functions (my_variable, not myVariable), PascalCase for class names, maximum line length of 79 characters (79–99 is acceptable in many projects), and two blank lines between top-level function or class definitions.

Naming matters more than most annotators flag it. for i in range(len(lst)) should almost always be for item in lst. Variable names like x, tmp, data in functions longer than three lines are a signal that the AI generated plausible-looking code without thinking about readability.

Anti-patterns to flag on sight, and what goes wrong with each:

except: with no exception type catches everything, including KeyboardInterrupt and SystemExit. The program can no longer be interrupted cleanly. Almost always wrong; the correct form specifies what to catch: except ValueError:.

Mutable default arguments (like def f(lst=[]):) create a shared object across all calls. The first call that appends to lst will see those items on every subsequent call. It’s one of Python’s most counterintuitive bugs and shows up regularly in AI-generated code.

is for value comparison (like if x is 5:) checks object identity, not equality. It may pass for small integers (CPython caches them) but fails unpredictably for larger values or strings. The correct form is ==.

Spotting bugs: a systematic approach

Don’t mentally run code only on the happy path. Apply this checklist:

Off-by-one errors: Loop bounds, slice indices, length checks. range(n) goes to n-1. lst[:-1] excludes the last element.
Empty input: What happens when the input list is empty? When a string is empty? When n=0?
Type errors: The function expects an int but might receive a string. The AI may not validate types.
Mutation of input: Does the function modify its arguments without documenting it? Often it shouldn’t.
Implicit None returns: A function that returns None on an unhandled branch — without documenting it — is a common AI bug.
Performance inside a loop: in checks on lists, repeated .sort() calls, string concatenation in a loop. Each is O(n) inside a loop, making the whole function O(n²).

Try It: spot the bugs

You’re reviewing this AI-generated Python function. The task was: “Write a function that takes a list of words and returns a dictionary mapping each word to the number of times it appears. Words should be case-insensitive.”

def word_count(words):
    counts = {}
    for word in words:
        if word in counts:
            counts[word] += 1
        else:
            counts[word] = 1
    return counts

Identify all bugs or issues in this implementation.

See answer

There are two issues:

1. Case-insensitivity is not implemented. The task explicitly required case-insensitive counting — "Hello" and "hello" should map to the same key. The function compares words as-is, so "Hello" and "hello" produce separate entries. The fix: normalize on ingestion with word.lower().

2. The manual dictionary pattern is unnecessary. The logic is correct but verbose — Python’s collections.Counter or dict.get() pattern handles this more cleanly. This is a style/quality issue, not a correctness bug, but worth noting in the rationale.

A correct implementation:

def word_count(words: list[str]) -> dict[str, int]:
    counts = {}
    for word in words:
        word = word.lower()
        counts[word] = counts.get(word, 0) + 1
    return counts

Or more concisely:

from collections import Counter

def word_count(words: list[str]) -> dict[str, int]:
    return Counter(word.lower() for word in words)

The case-insensitivity bug is the one that matters. It’s a direct spec violation. The verbosity is a quality note. Score them on separate criteria and explain both in your rationale.

Security issues in code review

For code review sessions involving backend or scripting code, security awareness matters. You don’t need to be a security engineer. You need to recognize the pattern, name the vulnerability class, and explain why it’s dangerous.

SQL injection is the most common critical flag. Any code that builds SQL queries with string formatting is vulnerable:

query = f"SELECT * FROM users WHERE name = '{name}'"

A crafted input can manipulate the query logic entirely. Parameterized queries are the required fix. The value is passed separately, treated as data, not executable SQL.

Hardcoded credentials are a critical violation on sight:

API_KEY = "sk-abc123secretkey"
DB_PASSWORD = "hunter2"

Keys and passwords in source code get committed to version control, shared, and leaked. Flag immediately. No context needed.

Arbitrary code execution via eval() or exec() on user-supplied input is almost always dangerous. Flag any use of these on untrusted data.

Path traversal: file operations using user-supplied paths without sanitization can allow an attacker to read or write files outside the intended directory. open(user_input) with no validation is the pattern to catch.

Try It: spot the security vulnerability

You’re reviewing a Python backend function that queries a database. The task asked the AI to write a function that retrieves a user record by username.

def get_user(username):
    conn = get_db_connection()
    cursor = conn.cursor()
    query = f"SELECT * FROM users WHERE username = '{username}'"
    cursor.execute(query)
    return cursor.fetchone()

Identify the security vulnerability, name its class, and explain what a correct implementation looks like.

See answer

Vulnerability: SQL Injection.

The function constructs the SQL query by directly embedding a user-supplied string (username) into the query using an f-string. An attacker can pass a crafted username to manipulate the query logic.

If username = "admin' --", the executed query becomes:

SELECT * FROM users WHERE username = 'admin' --'

The -- comments out the rest of the query, and the attacker gains access to the admin account without knowing the password.

Correct implementation using parameterized queries:

def get_user(username):
    conn = get_db_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users WHERE username = ?", (username,))
    return cursor.fetchone()

The ? placeholder passes the value separately, ensuring it’s treated as data, not executable SQL. The database driver handles the escaping.

This is a mandatory flag in any code review task. Mark it as a critical security defect: not a style issue, not a complexity issue. A different rubric dimension with a hard fail.

What “code review” means as an annotation task

In a code review annotation session, you receive a prompt describing what the code should do, AI-generated code attempting to fulfill it, and a rubric with criteria: correctness, complexity, style, edge case handling, documentation. Your job is not to rewrite the code. It is to evaluate it on each criterion with specific, evidence-based rationale.

“The code is not very efficient” fails the rationale standard. “The code uses a nested loop (lines 8–12) giving O(n²) time complexity; a hash-set approach would achieve O(n)” meets it.

The best code reviewers think like engineers: they run the code mentally on normal inputs, boundary inputs (empty, single element, maximum size), and adversarial inputs (duplicates, negatives, None). They score each criterion separately, because passing one doesn’t excuse failing another.

Quick Reference

Correctness before everything else: Check that the code produces correct output for normal inputs, empty inputs, and edge cases before evaluating style or complexity. A beautifully formatted function with a bug in the empty-input case fails the task.
Complexity and correctness are separate criteria: An O(n²) solution that produces correct output is not “wrong”. It’s correct and suboptimal. Score it that way. Do not conflate the two rubric dimensions.
The bugs that appear most often in AI-generated Python: Missing edge case guards (empty list, single element), input mutation (sorting in place without documenting it), and O(n) operations inside loops that make the whole function O(n²).

Python for AI Training

What coding tasks look like on AI training platforms

Algorithmic complexity: what you must know cold

Writing clean Python: the standards code reviewers apply

Spotting bugs: a systematic approach

Security issues in code review

What “code review” means as an annotation task

Quick Reference

Test Your Knowledge

1. What is the time complexity of binary search on a sorted array of n elements?

2. Which of the following violates PEP 8 style guidelines?

3. During a code review task you find a function that produces correct output but uses O(n²) nested loops where an O(n) solution with a hash map is straightforward. Should you flag this?

4. Which of the following represents a security vulnerability in Python code?

5. Which of the following illustrates a classic off-by-one error?

6. When reviewing AI-generated code, what should you verify first before checking style or performance?

7. Which of the following is NOT a valid reason to mark AI-generated code as incorrect?

8. What is the primary purpose of type hints in Python, and why do they matter in AI training code tasks?

Sign in to see your results

Results

How did this quiz feel?

Was this worth your time?

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have