Older listing — position may have been filled

This listing is no longer actively promoted, but you're still welcome to apply — platforms often reopen roles or keep applications on file.

Coding

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • Remote • Posted 113 days ago

Education

Any

Type

hourly

Pay Rate

$80/task

Posted

113d ago

Check Listing →

Role Overview

As a Evaluation Scenario Writer - AI Agent Testing Specialist, you will solve complex coding challenges to create "Golden Data" for training advanced code-generation models. You will likely engage in "red-teaming" (adversarial testing) to find logic gaps in AI code, write unit tests for generated snippets, and provide expert-level refactoring. This is pure technical work—no meetings, just deep-dive problem solving in languages like Python and C++.

Requirements

Fluent proficiency in English (Written & Verbal)
Reliable high-speed internet connection
Bachelor's degree or equivalent professional experience
Demonstrated expertise in Coding
Proficiency in at least one programming language
Understanding of algorithms and software design patterns

Compensation Analysis

Join the workforce powering the AI revolution. With a competitive rate of $80/hr and remote flexibility, this role allows you to balance professional growth with personal freedom. No previous AI experience is usually required—just your domain expertise.

Skills & Categories

Explore other opportunities in related specializations:

Coding

Related Jobs

Freelance Agent Evaluation Engineer

mindrift • STEM

$80

Freelance Earth Science Expert - AI Trainer

mindrift • STEM

$55

Application Form - University Students and Alumni

mindrift • Generalist

AI Agent Evaluation Analyst - AI Trainer

mindrift • Generalist

Browse All Jobs from Mindrift

Discover more opportunities on Mindrift that match your skills and interests.

View All Mindrift Jobs →

Community Reviews

Loading reviews…

Frequently Asked Questions

Who is Mindrift for?

Mindrift (built by data-labeling giant Toloka) is best suited for freelance writers, editors, and generalist AI tutors. If you have strong English fluency, solid grammar, and good research skills — but no specialized tech degree — Mindrift is designed for you. Specialized domain experts (cybersecurity, medicine, law) can also access higher-paying projects once verified.

Why are the rates lower than other platforms?

General evaluation tasks pay around $15–$30/hr because they are high-volume, lower-complexity work (basic fact-checking, tone evaluation) that do not require an advanced degree. However, if you are a verified domain expert, rates on specialized projects scale up to $40–$100+/hr. Start generalist, build your profile, and unlock specialist tracks.

What does a typical task look like?

Most tasks follow this pattern: read a context or scenario → write a short prompt for the AI (~100 words) → evaluate two AI responses to that prompt → fact-check the outputs → write a brief explanation (~50 words) on which response is better, citing the project rubric. The focus is clarity, safety, and strict rule-following — not creative writing or length.

What equipment do I need?

For voice or audio roles at this pay level, you typically need a professional home studio setup (XLR microphone, treated room). Phone recordings or laptop mics are usually rejected by quality control.

How is my work used?

You are providing high-quality "ground truth" data. For writers, this means creative generation. For voice actors, it often means training Text-to-Speech models. Be sure to check the specific contract details regarding rights usage for your voice or likeness.

Is creative freedom allowed?

Yes and no. While you are hired for your talent, you must often follow strict style guides (e.g., "Speak in a neutral tone" or "Write in the style of a technical manual"). The goal is consistency for the dataset.