aitrainer.work - AI Training Jobs Platform
Markdown & JSON Formatting Mastery
0%
F7: Markdown & JSON Formatting Mastery
F7 Core
⏱️ 1 hour 🎯 Intermediate

Markdown & JSON Formatting Mastery

Learn the formatting standards that AI training platforms enforce: proper Markdown, valid JSON, structured outputs for tool use. Discover how formatting errors degrade model training.

Test Your Knowledge
Before you read anything β€” make a call

A platform asks you to write a demonstration response to this prompt:

β€œSummarize the Q3 revenue results for two products, and include the Python function used to calculate growth rate.”

Here are two AI responses. Which one would a model learn from?

Response A:

The results are:
Product A - Revenue: $1.2M, Growth: 12%
Product B - Revenue: $0.8M, Growth: -3%

Note: these figures are from Q3 only.

def calculate_growth(current, previous):
  return (current - previous) / previous * 100

Response B:

## Q3 Revenue Results

| Product   | Revenue | Growth |
|-----------|---------|--------|
| Product A | $1.2M   | +12%   |
| Product B | $0.8M   | -3%    |

**Note:** These figures reflect Q3 only and are not annualized.

```python
def calculate_growth(current, previous):
    return (current - previous) / previous * 100
```

Which is the better training signal?

See the answer

Response B. The gap matters more than it looks.

Response A contains the right information. But it’s ambiguous noise to a model. The β€œtable” is plain text dashes and pipes that won’t render as a table in any Markdown parser. The code is pasted as plain text with no language specifier, so the model has no signal that this is Python or that it should be in a code block. The note is just a line of text with no visual distinction from surrounding content.

Response B teaches structure. The ## heading signals β€œthis is a major section.” The pipe table renders as an actual table. The python specifier on the code block tells the model this is code, this is Python, and it belongs in a fenced block. The bold note has semantic weight.

When a model trains on Response A, it learns making a mess is fine. When it trains on Response B, it learns how to produce structured, navigable, correctly-formatted output.

Formatting isn’t presentation. It is part of what the model learns.


Formatting is a training signal

Every demonstration response you write, and every AI response you evaluate, teaches the model something about what good output looks like. If expert demonstrations consistently use ## for major sections and fenced code blocks for all code, the model learns that pattern. If demonstrations are inconsistent, the model learns inconsistency and reproduces it. Do not substitute bold text for headings, paste code as plain text, or make fake tables.

The failures that matter most are silent ones. Markdown that looks fine as raw text but renders incorrectly, or JSON that passes a visual scan but breaks a parser downstream, are huge problems. Both are covered here.


Markdown: what platforms enforce

Headers

Use ATX-style headers (hash marks), not underline-style. Use heading levels semantically:

# Page Title (H1) β€” typically one per document
## Major Section (H2)
### Subsection (H3)
#### Minor subdivision (H4) β€” use sparingly

The most common mistake is using bold text (**Section Name**) as a substitute for a heading. It looks similar when rendered but has no semantic meaning. It won’t appear in a table of contents, won’t be parsed as a section by downstream tools, and teaches the model that bold equals heading. If the rubric calls for a structured response with sections, use actual headings.


Try It: bold headers β€” violation or style choice?

A task instruction says: β€œWrite a structured response with three sections: Summary, Key Findings, and Recommendations.” The AI produces:

**Summary**
The analysis covers Q3 performance across five product lines...

**Key Findings**
Revenue increased 12% year-over-year...

**Recommendations**
Prioritize investment in product lines 2 and 4...

Is this a formatting violation, a style issue, or acceptable? Does it matter which rendering environment this displays in?

See answer

This is a formatting violation, not merely a style choice.

The task explicitly requires β€œsections.” This is a structural term. Sections in a Markdown document are created with heading syntax (##, ###), not bold text. The distinction matters for three reasons:

Semantic structure: ## headings are parsed by renderers as headings. They appear in tables of contents, are navigable by screen readers, and are recognized by downstream text processing tools. Bold text is purely visual.

Model training signal: If this is a demonstration response, the model learns that bold text equals section header. It will then produce bold text headers in contexts where real headings are expected, passing on the mistake.

Rendering consistency: Bold text renders consistently, but it won’t create a heading element in HTML or a navigable section in any document renderer.

The correct format:

## Summary
...
## Key Findings
...
## Recommendations
...

If the style guide had explicitly specified β€œuse bold for section labels, not headings,” then bold would be compliant. Otherwise, using bold as a heading substitute is a structural formatting error.


Code blocks

Always use fenced code blocks with a language specifier for any code:

```python
def hello(name):
    return f"Hello, {name}"
```

Never use inline backticks for multi-line code. Never paste code as plain text. The language specifier (python, sql, json, bash) enables syntax highlighting and signals to the model what kind of code it is β€” omitting it is a weaker training signal even when the code itself is correct.

Inline code (single backticks) is for short references within prose: β€œUse the groupby() method to aggregate.” Not for code blocks.

Tables

Markdown tables require a header row, a separator row, and data rows:

| Column A | Column B | Column C |
|----------|----------|----------|
| Value 1  | Value 2  | Value 3  |
| Value 4  | Value 5  | Value 6  |

Alignment is controlled by colons in the separator row: |:---| (left), |:---:| (center), |---:| (right). Omitting the separator row entirely breaks table rendering in most parsers. The content falls through as plain text with no indication that anything went wrong.

Lists and emphasis

For lists, use consistent indentation (2 or 4 spaces, but consistent throughout the document). Don’t use a numbered list for items with no inherent order, and don’t use bullets for a sequence of steps that must be followed in order. The list type carries meaning.

For emphasis: **bold** for strong emphasis or key terms, *italic* for titles or mild emphasis. Don’t use ALL CAPS. It is inaccessible and not Markdown.


JSON: the rules that break parsers

JSON is the standard format for structured AI outputs, particularly for tool use tasks. An invalid JSON object breaks the entire pipeline downstream silently. There is no warning, just a failed parse. Three categories of errors cover almost everything you’ll encounter.

Quoting and key rules: All strings and all keys must use double quotes. {name: "Alice"} is invalid because the key is unquoted. {'name': 'Alice'} is invalid because single quotes aren’t JSON. {"name": "Alice"} is correct.

Value rules: Booleans are true and false β€” lowercase, no quotes, not Python’s True/False. Null values are null β€” not Python’s None, not JavaScript’s undefined. These aren’t style preferences; they’re specification requirements. Any conforming JSON parser rejects the alternatives.

Structural rules: No trailing commas after the last item in an object or array. No comments of any kind. They’re valid in JavaScript but will cause a JSON parse error. If you see comments in AI output, flag it as invalid.


Try It: spot the JSON errors

You’re reviewing an AI tool-use response. The AI was asked to return a structured JSON object representing a user profile. It produced:

{
  'name': 'Jordan Lee',
  'age': 29,
  'active': True,
  'tags': ['finance', 'analyst'],
}

Identify every JSON error in this snippet. How many are there, and what are they?

See answer

There are three errors:

1. Single quotes instead of double quotes on all string keys and values. JSON requires double quotes: "name", "Jordan Lee", "finance".

2. Python-style True instead of JSON true. JSON boolean values are lowercase: true and false. Capitalized True is Python syntax and will cause a JSON parse error.

3. Trailing commas. You can’t put them after the last array element ("analyst",) or after the last object property ('tags': [...],). JSON does not permit trailing commas.

Corrected version:

{
  "name": "Jordan Lee",
  "age": 29,
  "active": true,
  "tags": ["finance", "analyst"]
}

Mental checklist: double quotes everywhere, lowercase booleans, no trailing commas, no comments, null not None.


Nested structures

AI tool use tasks often produce nested JSON. Verify that nesting is correct and that arrays contain consistent types:

{
  "results": [
    {
      "id": 1,
      "tags": ["finance", "accounting"],
      "metadata": {
        "confidence": 0.92,
        "reviewed": false
      }
    }
  ]
}

Watch for: inconsistent array element types (["a", 1, null, true] β€” valid JSON but often a semantic error), deeply nested objects when a flatter structure was requested, missing required keys.


Structured outputs for tool use

In tool use annotation tasks, the AI produces structured outputs that a downstream system parses and acts on. A correct tool-use response looks like this:

{
  "tool": "search_database",
  "parameters": {
    "query": "annual revenue 2023",
    "filters": {
      "industry": "technology",
      "min_employees": 500
    }
  }
}

When evaluating these, check four things: syntactic validity (would this pass a JSON parser?), correct tool name from the schema provided, all required parameters present with the right types, and optional parameters either used correctly or omitted cleanly β€” not present with a null value when the schema says to omit them.


Try It: Python None in a tool response

A tool-use annotation task requires the AI to return a JSON object with this schema: {"result": <string or null>, "success": <boolean>}. The AI returns:

{"result": None, "success": True}

What is wrong with this response? Classify the error type (style issue vs. correctness error) and explain the downstream impact.

See answer

This is a hard correctness error, not a style issue.

The response contains two JSON violations:

None is Python syntax, not JSON. JSON represents null values as null (lowercase). Any JSON parser will reject None as an invalid token and throw a parse error. The downstream system consuming this response β€” a database writer, an API handler, another pipeline step β€” will fail immediately.

True is Python syntax, not JSON. JSON booleans are true and false (lowercase). Same result: parse failure.

Corrected response:

{"result": null, "success": true}

This is not a preference (β€œsome systems accept True”). It is a strict specification violation. The JSON standard does not permit Python literals, and no conforming JSON parser will accept them. The error class is β€œinvalid JSON” β€” not β€œslightly informal JSON.”

When evaluating tool-use AI responses, mentally run the output through a JSON parser before accepting it as valid. If it would fail json.loads() in Python, flag it.


Quick Reference

  • Formatting is a training signal: Inconsistent or incorrect formatting in demonstration responses teaches the model to reproduce that inconsistency. Use ## headings (not bold), fenced code blocks with language specifiers (not plain text), and pipe tables with separator rows (not dashes).
  • JSON hard rules: Double-quoted keys and strings, lowercase true/false/null, no trailing commas, no comments. Any of these broken means the JSON is invalid and will fail parsing downstream β€” not a style issue.
  • Bold β‰  heading, None β‰  null, True β‰  true: These are the three substitutions that appear most often in AI output and produce silent failures. Catch them every time.

Test Your Knowledge

Answer question 1 to preview. Sign in to complete all 8 questions.

Question 1 of 8

1. Which of the following is valid JSON?

easy

Was this worth your time?

Rate this module β€” takes two seconds.