Software Engineer – AI Model Evaluator
Alignerr • Remote • Posted 27 days ago
Education
Any
Type
Pay Rate
$75/task
Posted
27d ago
✅ Applying through this link gives you a verified candidate referral.
Referrals from verified candidates give your profile a visibility boost and help support our platform at no cost to you.
This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.
About this Role
What You'll Do
- Evaluate the performance of frontier language models on complex, real-world software engineering tasks
- Identify bugs, logical errors, hallucinations, and reliability issues in AI-generated code and reasoning
- Design and review prompts, test cases, and evaluation scenarios that stress-test advanced coding workflows
- Provide precise, well-reasoned written feedback explaining model strengths, weaknesses, and edge cases
- Work across multiple programming languages and codebases to assess generalization, correctness, and robustness
- Think critically about model behavior — not just whether code runs, but whether it's right
About the Role
What if your years of engineering experience could directly influence how the world's most advanced AI systems write and reason about code? We're looking for experienced software engineers to evaluate frontier AI models — hunting down bugs, exposing failure modes, and helping ensure that AI-generated code actually holds up under real-world scrutiny. This is a fully remote, flexible contract role built for engineers who love digging into hard problems. You set your own schedule, work across cutting-edge projects, and make a tangible impact on the AI tools that millions of developers will rely on.
- Organization: Alignerr
- Type: Hourly Contract
- Location: Remote
- Commitment: 10–40 hours/week
Who You Are
- 3+ years of professional software engineering experience
- Strong proficiency in at least one of: TypeScript, Ruby, Java, or C++
- Sharp debugger — you spot non-obvious issues and can articulate exactly why something is broken
- Excellent written and spoken English; you communicate technical findings clearly and precisely
- Comfortable reasoning about complex systems, edge cases, and unexpected failure modes
- Familiarity with modern development tooling — Git, CLI workflows, testing frameworks, and similar
- You critically evaluate outputs rather than taking them at face value
Nice to Have
- Experience across multiple programming languages or paradigms
- Background in QA, code review, or software reliability engineering
- Familiarity with AI or LLM tools and how they generate code
- Interest in AI safety, alignment, or model evaluation research
Why Join Us
- Work on cutting-edge AI projects alongside leading research labs
- Fully remote and flexible — work when and where it suits you
- Freelance autonomy with the structure of meaningful, high-impact technical work
- Make a direct, tangible impact on how AI writes, reasons about, and understands code
- Potential for ongoing work and contract extension as new projects launch
Requirements
- Fluent proficiency in English (Written & Verbal)
- Reliable high-speed internet connection
- Bachelor's degree or equivalent professional experience
- Demonstrated expertise in Software Engineering
Eligible Languages
Fluent proficiency in English
Compensation Analysis
What if your years of engineering experience could directly influence how the world's most advanced AI systems write and reason about code? We're looking for experienced software engineers to evaluate frontier AI models — hunting down bugs, exposing failure modes, and helping ensure that AI-generated code actually holds up under real-world scrutiny
Skills & Categories
Explore other opportunities in related specializations:
Related Jobs
Browse All Jobs from Alignerr
Discover more opportunities on Alignerr that match your skills and interests.
View All Alignerr Jobs →Community Reviews
Leave your review
Frequently Asked Questions
What is the assessment actually like?
Notoriously strict. Alignerr uses TestGorilla for role-specific timed tests — a blank coding environment for engineers, rigorous grammar and fact-checking for writers. There is almost no hand-holding. The critical catch: this is essentially a one-shot process. Fail or abandon the assessment, and you are typically locked out of that role permanently with no option to retake.
How quickly can I start earning after I pass?
Not immediately. Even after passing the assessment and completing identity verification (via Persona) and billing setup (via Deel), you may sit in a waiting pool for weeks or months. You only start earning when a project matching your specific skills launches and you are officially assigned. Do not plan around Alignerr income until you are actively on a project.
Is there a community?
Yes — and it is one of Alignerr's genuine strengths. Once assigned to a project, you are added to Slack channels where you can ask questions, get rubric clarifications from admins, and talk to other AI trainers. This is rare in AI training and makes a real difference when guidelines are ambiguous or change mid-project.
What does the work actually look like?
It is practical, hands-on data work. You might be recording short videos, categorizing images, rating text responses, or analyzing data. The tasks are designed to be short and distinct—typically 5-60 minutes per task.
How flexible is the schedule?
Extremely. This is true "log in and work" flexibility. You can usually work for 20 minutes or 4 hours depending on your availability. There are rarely minimum hour requirements, making it ideal for side income.
Is there an interview?
Usually, no. Hiring for these roles is almost entirely based on passing an automated assessment or "qualification" task. If you pass the test, you get access to the work.
What is the barrier to entry?
Alignerr is known for difficult technical assessments. You must pass a timed test in your specific domain (e.g., Python, Physics, or Language) before you are eligible for any paid projects.