Alignerr WorldSim Eval: How to Pass and Get to $90/hr

Alignerr occasionally opens a highly selective engineering track: a Senior Software Engineer — AI Evaluation role that pays $90/hr on a long-term contract. The gate into this track is the WorldSim Eval Batch.

The WorldSim Eval is different from Alignerr's standard written assessment. It's a multi-stage simulation that tests your ability to reason about software systems, evaluate AI-generated code, and catch subtle errors that an AI model would not catch itself. Candidates who complete it sooner get reviewed sooner — the queue is rolling, not batched.

This guide walks you through exactly how to find the eval, what it's testing, how to score well, and what comes next.

What is the WorldSim Eval?

WorldSim is Alignerr's code-name for a project that involves evaluating AI responses in simulated real-world software engineering scenarios. Your job as an evaluator is to act like a senior engineer reviewing AI output: catch bugs, rate code quality, flag misleading explanations, and rewrite weak responses.

The "Eval Batch" is the qualifying step. It's a timed sample of the actual work — a handful of real tasks from the project pipeline — and your performance on it determines whether you get onboarded for the full role.

Why does Alignerr use a paid eval instead of a free test?

Standard free assessments filter out effort, not skill. By running the eval inside the real tooling (Labelbox), Alignerr gets actual signal on your work quality — and you see whether you like the tasks before committing to months of work.

How to Find It on Your Dashboard

The WorldSim Eval Batch does not appear in your dashboard by default. You need to have an approved Alignerr account, and the project must be open for new candidates (batches open on a rolling basis).

Log into your Alignerr dashboard. If you don't have an account yet, sign up here.
Click "Go to Projects" in the main navigation.
Scroll the project list and locate the card labeled "WorldSim Eval Batch". If you don't see it, the current batch may be full — check back in a few days.
Click "View" on the project card. Read the full instructions on that page carefully — there are specific rules about what counts as a valid task response.
Scroll to the bottom of the instructions page and acknowledge that you've read them. Then click "Start".

Re-attempt available?

If you previously attempted the WorldSim Eval but did not complete it or did not pass, a re-attempt is sometimes made available. The project card will show a "Re-attempt" button instead of "Start." The re-attempt uses a different set of tasks.

What the Evaluation Actually Tests

Each task in the WorldSim Eval gives you an AI-generated software engineering response (code, explanation, or debug walkthrough) and asks you to evaluate it. The work typically falls into three categories:

1. Code Review & Bug Detection

You're shown a coding prompt and an AI-generated solution. Your job: identify whether the code is correct, explain any bugs you find, and rate the overall quality. These tasks are not trick questions — they test whether you can actually read and reason about code, not just run a linter.

Common languages: Python, JavaScript/TypeScript, SQL. Occasional Go and Java.

2. Explanation Quality Rating

The AI explains a technical concept or system design decision. You rate how accurate, complete, and useful the explanation is. You also flag specific sentences that are misleading or wrong. Vague ratings with no justification are penalized.

3. Response Rewriting

For tasks where the AI's response is weak, you're asked to write a better version. This is the highest-weight task type. Reviewers are looking for concise, technically precise language — not padding.

How It Is Scored

The WorldSim Eval is calibrated — meaning your responses are compared against a gold-standard answer prepared by Alignerr's internal team. There is no partial credit for being "close." Either your bug identification matches the rubric or it doesn't.

What reviewers look for	Common failure mode
Specific bug identification	Saying "this code might have issues" instead of pointing to the exact line and explaining why it fails.
Justified ratings	Giving a quality score of 3/5 with no explanation. Always write a sentence explaining what held the score back.
Rewrite quality	Rewrites that are longer but not more correct. Reviewers check for precision, not word count.
Completion rate	Abandoning tasks mid-batch. Finishing all tasks (even imperfectly) scores better than submitting half.

Tips to Pass

Always run the code mentally before rating it

The most common mistake is rating code as "correct" without tracing through edge cases. Ask yourself: what happens if the input is empty? What if it's null? What if it's a very large number? AI models frequently produce code that works on the happy path but breaks on edge cases.

Don't be generous with ratings

This is AI evaluation work — the whole point is to be critical. If you rate everything 4–5 out of 5, you are not providing useful signal and your eval will be flagged as low quality. A strong evaluator finds the specific thing wrong and names it clearly.

Read the full instructions before starting

Alignerr's project instructions define specific rubric terms (like "factual accuracy" vs. "helpfulness"). Using those exact terms in your justifications signals that you understand the rubric — which reviewers look for.

Complete it sooner rather than later

Alignerr reviews completed evals on a rolling basis. Candidates who finish early are reviewed and advanced earlier. If you wait a week, you may still pass but miss the current onboarding group — and the next one could be months away.

What Happens After You Pass

Passing the WorldSim Eval is step one of a three-step pipeline before you start earning:

WorldSim Eval ← You are here

Timed sample tasks inside Labelbox. Reviewed by Alignerr's QA team. Results typically within 3–5 business days.

Zara AI Interview

A 15–30 minute real-time video interview with Zara, Alignerr's AI interviewer. For an SWE role, this will include technical verbal questions — expect to explain system design choices and debug scenarios out loud. Read the Zara guide →

Background Check

Standard contractor background check. Alignerr uses a third-party provider and the process typically takes 2–5 business days.

✓

Onboarding — $90/hr, Full-time

You are added to the WorldSim project in Labelbox and can begin taking tasks. The role is long-term (several months), full-time, remote. Payment via Deel on a bi-weekly schedule.