Before you read anything: make a call

A task asks the AI to solve: Find $\int x e^x , dx$.

The AI responds:

$$ \int x e^x , dx = e^x \cdot \frac{x^2}{2} + C $$

“Using the product rule for integration, we multiply $e^x$ by the integral of $x$, which gives $\frac{x^2}{2}$.”

The answer is wrong, but before reading why, identify where the reasoning breaks down.

See the answer

This fails on reasoning, not just the final answer. That distinction is the whole point of STEM annotation.

The AI applied a rule that doesn’t exist. There is no “product rule for integration.” The AI appears to have confused integration with differentiation (where the product rule is a real technique) and invented an integration analogue.

What happened: the AI treated $e^x$ as if it were a constant, then integrated $x$ alone to get $\frac{x^2}{2}$, and multiplied the two. But $e^x$ is a function of $x$; it can’t be pulled out of the integral. The correct technique here is integration by parts:

Let $u = x$, $dv = e^x dx$, so $du = dx$, $v = e^x$.

$$ \int x e^x , dx = uv - \int v , du = xe^x - \int e^x , dx = xe^x - e^x + C = e^x(x - 1) + C $$

Why this matters for annotation: A correct final answer reached by invalid reasoning is still a training failure. The model learns the reasoning chain, not just the result. An AI that reaches the right answer by inventing a nonexistent rule trains the model to apply that rule on future problems where it will produce wrong answers.

When you evaluate STEM tasks, verify each step’s justification, not just the conclusion. A confident-sounding wrong argument is harder to catch than a numerical error, and more damaging when it gets through.

Right answer, wrong reasoning: still a failure

Mathematical annotation is demanding not because the math is always hard, but because it requires a specific kind of attention most people don’t default to: verifying reasoning steps, not just outcomes.

An AI can reach a correct numerical answer through an invalid process. It can apply a real technique to the wrong problem. It can skip steps that conceal an error. It can state a correct conclusion from premises that don’t support it. In all of these cases, the final answer might look fine, but the training signal is wrong.

This module covers two skills: LaTeX (the formatting language for math) and mathematical reasoning evaluation (verifying that AI solutions are not just numerically correct but logically sound). Both are required for STEM annotation at the Specialist and Subject Matter Expert tiers.

LaTeX syntax: the essentials

LaTeX is the standard for rendering mathematical notation in AI training contexts. Inline math uses single dollar signs; display (block) math uses double dollar signs or \[ \].

Inline vs. display mode

Inline: The formula is $E = mc^2$ and it appears within the sentence.

Display mode (centered, larger):
$$
E = mc^2
$$

Common math constructs

Fractions:

$$\frac{numerator}{denominator}$$
$$\frac{d}{dx}\left(\frac{x^2+1}{x-1}\right)$$

Exponents and subscripts:

$$x^{n+1}$$           % exponent with multiple characters needs braces
$$a_i$$               % single-character subscript
$$\sum_{i=1}^{n} i^2$$   % summation with bounds: braces required for multi-char bounds

Square roots and nth roots:

$$\sqrt{x^2 + y^2}$$
$$\sqrt[3]{8} = 2$$

Greek letters: \alpha, \beta, \gamma, \delta, \epsilon, \theta, \lambda, \mu, \sigma, \pi, \omega, \Sigma (uppercase), \Delta (uppercase)

Operators: \cdot (dot product), \times (cross product), \leq, \geq, \neq, \approx, \equiv, \in, \subset, \cup, \cap, \infty

Matrices:

$$
A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}
$$

Common LaTeX errors to catch in AI output

Missing braces on multi-character exponents or subscripts: x^n+1 renders as $x^n + 1$, not $x^{n+1}$. The exponent only applies to the single character immediately following ^ unless braces wrap the full expression. The correct form is x^{n+1}.

Wrong operator: x * y in LaTeX renders literally as x * y. The correct forms are x \cdot y (dot product) or x \times y (cross product) depending on context.

Mismatched delimiters: \left( must be paired with \right). An AI that opens a \left[ and closes with \right) produces a rendering error.

Forgetting math mode: Writing theta in prose instead of $\theta$ means the Greek letter is not rendered. It appears as the literal word “theta.”

Incorrect spacing: In LaTeX, spaces inside math mode are ignored. Use \, for thin space, \quad for medium space, \qquad for large space where readability requires it.

Try It: spot the LaTeX error

An AI is writing up the solution to a summation problem. It produces:

The total is given by $\sum_i=1^n x_i^2 + 1$.

Identify every LaTeX error in this expression. What did the AI intend to write, and what will actually render?

See answer

There is one error, but it’s worth understanding precisely what’s wrong and what isn’t.

Error: Missing braces on the subscript. \sum_i=1^n uses _i as the subscript, so only the single character i is treated as the lower bound. The =1 renders as literal text outside the subscript, floating next to the summation symbol. The correct form is \sum_{i=1}^{n} with the full lower bound i=1 enclosed in braces.

What about x_i^2 + 1? This is not an error; it’s ambiguous, and the resolution depends on intent. If the intent is $x_i^2 + 1$ (the square of $x_i$, plus one), the current form is correct: ^2 applies only to the 2, and + 1 is a separate term. If the intent were $x_i^{2+1} = x_i^3$, braces would be needed: x_i^{2+1}. Read the surrounding context to determine which meaning is intended before flagging.

What will render: The summation displays with only i as the lower index (not i=1) and =1 appears as stray text. The upper bound n is correct.

Corrected form (assuming intent is $\sum_{i=1}^{n} x_i^2 + 1$):

$\sum_{i=1}^{n} x_i^2 + 1$

Evaluating mathematical reasoning step by step

The verification protocol

For each step in an AI’s mathematical solution:

Identify the operation: What mathematical rule or technique is being applied? (chain rule, integration by parts, completing the square, polynomial long division, etc.)
Verify the algebra: Does the arithmetic check out? Substituting numbers into symbolic steps is often the fastest verification method.
Check for skipped steps: An AI that jumps from step 2 to step 5 may be concealing an error in the omitted steps. If you cannot reconstruct the missing steps yourself, flag the gap.
Verify the conclusion: Does the final answer follow from the final step? Does it have the right units (for physics problems)? Is it in the required form?

Example: correct reasoning vs. flawed reasoning

Problem: Solve $\int x e^x , dx$

Correct solution: $$ \int x e^x , dx = x e^x - \int e^x , dx = x e^x - e^x + C = e^x(x-1) + C $$

The AI applied integration by parts with $u = x$, $dv = e^x dx$, giving $du = dx$, $v = e^x$. Each step follows from the technique. The conclusion is correct.

Flawed solution (from the Entry Simulation): $$ \int x e^x , dx = e^x \cdot \frac{x^2}{2} + C $$

The AI treated $e^x$ as a constant and integrated $x$ alone: a technique that doesn’t exist. The error isn’t numerical; it’s a misapplication of a rule. Your rationale should name the specific mistake: “The AI treated $e^x$ as a constant factor, which is invalid since $e^x$ is a function of $x$. The correct technique is integration by parts.”

Try It: right answer, missing steps

The task asks the AI to solve: Find the derivative of $f(x) = \ln(x^2 + 1)$.

The AI responds:

“Using the chain rule, $f’(x) = \dfrac{2x}{x^2+1}$.”

The answer is correct. Is this response a reasoning failure? How should you evaluate it?

See answer

It depends on what the task specifies, but a one-line answer with no intermediate steps is generally insufficient for a STEM demonstration task.

The correct answer is $\dfrac{2x}{x^2+1}$. That’s not in dispute. But the AI skipped the explicit chain rule decomposition. A complete solution shows the reasoning chain:

Let $u = x^2 + 1$, so $f(x) = \ln(u)$.

By the chain rule:

$$f’(x) = \frac{d}{du}[\ln(u)] \cdot \frac{du}{dx} = \frac{1}{u} \cdot 2x = \frac{2x}{x^2+1}$$

Why skipped steps matter in AI training: The model learns from the reasoning chain, not just the final answer. A solution that jumps to the conclusion trains the model to produce conclusions without justification. A reviewer also can’t verify correctness for the skipped portion: what if the AI reached the right answer through incorrect intermediate reasoning?

How to score it: If the rubric has a reasoning or steps-shown criterion, this fails it — regardless of the correct final answer. If the task only asked for the result with no requirement to show work, it may pass. Evaluate against the stated criteria, not against your general preference for complete solutions. The distinction matters because it’s not your job to add requirements the rubric doesn’t specify.

Olympiad-level and competition math

High-end STEM annotation tasks often involve competition mathematics (AMC, AIME, Putnam, IMO level). These require proof-based reasoning, not just computation.

Proof structure: A valid proof must establish a claim for all cases, not just example cases. An AI that “proves” a universal statement by checking three examples has produced an invalid proof.

Common proof techniques:

Mathematical induction: Base case + inductive step. Both must be established. A common AI error is proving only the base case, or stating the inductive hypothesis without proving the inductive step follows from it.
Proof by contradiction: Assume the negation, derive a genuine contradiction. The contradiction must be explicit. Saying “this is absurd” without showing why is not a proof.
Constructive proof: Exhibit the object whose existence is claimed. The AI must verify the object satisfies all stated conditions, not just assert it does.
Pigeonhole principle: If $n+1$ items are placed into $n$ containers, at least one container holds more than one item. Recognizing when this applies is the skill; applying it correctly requires identifying the “items” and “containers” precisely.

Combinatorics: Watch for AI errors in overcounting (not dividing by symmetries when objects are indistinguishable) or undercounting (treating distinguishable objects as identical).

Quantitative reasoning: what screening assessments test

Estimation: Fermi problems (“How many gas stations are in the US?”). The skill is structured decomposition and reasonable assumptions, not knowing the answer. Evaluate whether the AI breaks the problem into tractable sub-estimates and combines them coherently.

Probability: Basic Bayes’ theorem, conditional probability, expected value. AI errors here often involve confusing $P(A|B)$ with $P(B|A)$: the base rate fallacy. Watch for it explicitly in medical and legal probability problems.

Logic: Syllogisms, valid vs. sound arguments, common fallacies. Relevant for evaluating AI reasoning quality across domains.

Dimensional analysis: In physics and engineering tasks, checking that units are consistent is a first-pass sanity check on any formula. An AI that produces a velocity in kg·m is wrong regardless of the numerical value.

Try It: verify with dimensional analysis

A physics task asks: “A car accelerates from rest at 3 m/s². How far does it travel in 4 seconds?”

The AI responds:

“Using the kinematic equation $s = \frac{1}{2}at^2$: $s = \frac{1}{2} \times 3 \times 4^2 = \frac{1}{2} \times 3 \times 16 = 24$. The car travels 24 m.”

Verify this answer using dimensional analysis. Are both the numerical answer and the units correct?

See answer

Both are correct.

Numerical check: $\frac{1}{2} \times 3 \times 16 = \frac{48}{2} = 24$ ✓

Dimensional analysis:

$a$ has units of m/s²
$t^2$ has units of s²
$\frac{1}{2}$ is dimensionless
Therefore: $[s] = \text{m/s}^2 \times \text{s}^2 = \text{m}$ ✓

The units work out correctly to meters, confirming the kinematic equation was applied to the right quantities.

Why dimensional analysis is a useful first-pass tool: If the AI had used $s = at$ instead (missing the $\frac{1}{2}$ factor and one power of $t$), the units would have been $\text{m/s}^2 \times \text{s} = \text{m/s}$: a velocity, not a distance. The dimensional mismatch would flag the formula error immediately, before you even checked the arithmetic.

Use dimensional analysis on every physics answer. If the units of the result don’t match the units of the requested quantity, there’s a formula error regardless of the numerical value.

Physics and engineering notation

For physics and engineering annotation tasks:

Vectors are written in bold (\mathbf{F}) or with an arrow (\vec{F}), not as plain scalars
Units belong outside math mode or inside \text{}: write $5 \text{ m/s}$ , not $5 m/s$ (where m and s are interpreted as variables, not unit symbols)
Standard derivatives: $\frac{dy}{dx}$ ; partial derivatives: $\frac{\partial f}{\partial x}$
Scientific notation: $6.022 \times 10^{23}$ , not 6.022e23, which is code syntax, not math notation

Quick Reference

Right answer, wrong reasoning is still a failure: The model learns the reasoning chain, not just the result. Verify each step’s justification (what technique is applied, whether the algebra follows, whether any skipped steps conceal an error) before evaluating the conclusion.
The LaTeX brace rule: Any exponent or subscript longer than one character requires curly braces. x^n+1 renders as $x^n + 1$; x^{n+1} renders as $x^{n+1}$. Same for summation bounds: \sum_{i=1}^{n}, not \sum_i=1^n.
Dimensional analysis as a first-pass check: If the units of the AI’s result don’t match the units of the requested quantity, there’s a formula error regardless of the numerical value. Check units before checking arithmetic.

STEM & LaTeX (Math Solving)

Right answer, wrong reasoning: still a failure

LaTeX syntax: the essentials

Inline vs. display mode

Common math constructs

Common LaTeX errors to catch in AI output

Evaluating mathematical reasoning step by step

The verification protocol

Example: correct reasoning vs. flawed reasoning

Olympiad-level and competition math

Quantitative reasoning: what screening assessments test

Physics and engineering notation

Quick Reference

Test Your Knowledge

1. Which LaTeX command correctly renders a fraction with numerator a and denominator b?

2. An AI writes \sum_i=1^n x_i to represent a summation. What is the error?

3. To verify an AI's multi-step mathematical proof, what should you check first?

4. Which proof technique begins by assuming the negation of the statement you want to prove?

5. What is the correct LaTeX command for the lowercase Greek letter sigma?

6. An AI's answer to a physics problem gives a result in kg·m, but the expected unit for force is Newtons (kg·m/s²). What does this tell you?

7. In a proof by mathematical induction, which of the following is a valid inductive step?

8. LaTeX display math mode (for centered, numbered equations) uses which delimiters or environment?

Sign in to see your results

Results

How did this quiz feel?

Was this worth your time?

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have