Software Engineering alignerr

Senior Software Engineer – AI Evaluation

Alignerr • Remote • Posted 9 days ago

Education

Any

Type

Pay Rate

$/task

Posted

9d ago

✅ Applying through this link gives you a verified candidate referral.

Referrals from verified candidates give your profile a visibility boost and help support our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now →

About this Role

What You'll Do

Design and build scalable evaluation pipelines and frameworks for assessing AI model performance across diverse tasks and domains
Develop automated testing harnesses, scoring systems, and benchmarking tools for large language models and other AI systems
Write clean, production-quality code to process, analyze, and visualize evaluation datasets at scale
Create and maintain APIs, dashboards, and internal tools that enable research teams to run, track, and compare evaluations efficiently
Collaborate with AI researchers and data scientists to translate evaluation methodologies into reliable, repeatable software
Identify edge cases, failure modes, and reliability issues in AI outputs through systematic engineering approaches
Optimize system performance, data processing speed, and infrastructure costs
Contribute to the architecture and technical direction of the evaluation platform
Write clear documentation and participate in code reviews to maintain high engineering standards

About the Role

What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust.

This is high-impact, technically challenging work at the intersection of software engineering and AI research. You'll build the tools, pipelines, and frameworks that help leading research teams understand what their models can do, where they fail, and how to make them better. If you love building robust systems and care deeply about quality and measurement, this role puts you at the center of the AI revolution.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 20–40 hours/week

Who You Are

5+ years of professional software engineering experience, with a track record of building and shipping production systems
Strong proficiency in Python — including experience with data processing libraries (pandas, NumPy) and web frameworks (FastAPI, Flask, or Django)
Solid understanding of software architecture, design patterns, and engineering best practices
Experience working with large datasets and building data pipelines
Comfortable with cloud infrastructure (AWS, GCP, or Azure) and containerized deployments
Familiarity with version control (Git), CI/CD workflows, and testing frameworks
Strong problem-solving skills and the ability to work through ambiguity independently
Excellent written communication skills — you can document your work clearly and collaborate asynchronously
Self-motivated and reliable when working independently in a remote environment

Nice to Have

Experience with ML/AI evaluation, benchmarking, or model testing
Familiarity with LLMs, prompt engineering, or AI safety and alignment concepts
Background in building developer tools, internal platforms, or data infrastructure
Experience with distributed systems, message queues, or workflow orchestration (Airflow, Prefect, etc.)
Knowledge of statistical methods for measuring and comparing model performance
Prior experience in a remote-first or async-first engineering culture
Contributions to open-source projects related to AI, ML, or evaluation tooling

Why Join Us

Work on cutting-edge AI evaluation projects alongside world-class research labs
Directly influence how AI quality and safety are measured at scale — your code shapes the standard
Fully remote and flexible — work when and where you're most productive
Freelance autonomy with access to deeply meaningful, technically stimulating work
Collaborate with a global team of engineers and researchers pushing the boundaries of AI
Exposure to the latest developments in AI research, model capabilities, and evaluation science
Potential for ongoing work and contract extension as the platform and project scope grow

Requirements

Fluent proficiency in English (Written & Verbal)
Reliable high-speed internet connection

Compensation Analysis

Skills & Categories

Explore other opportunities in related specializations:

Software Engineering Coding Python

Related Jobs

Computer Science - Domain experts

turing • Software Engineering

$70

Atlassian Jira Admin

turing • Software Engineering

$55

Machine Learning Engineer (Talent Network)

mercor • Software Engineering

$250

Backend Engineer

mercor • Software Engineering

$150

Browse All Jobs from Alignerr

Discover more opportunities on Alignerr that match your skills and interests.

View All Alignerr Jobs →

Community Reviews

Loading reviews…

💬

Share your experience with Alignerr

Help other candidates make better decisions by leaving a review.

Frequently Asked Questions

What is the assessment actually like?

Notoriously strict. Alignerr uses TestGorilla for role-specific timed tests — a blank coding environment for engineers, rigorous grammar and fact-checking for writers. There is almost no hand-holding. The critical catch: this is essentially a one-shot process. Fail or abandon the assessment, and you are typically locked out of that role permanently with no option to retake.

How quickly can I start earning after I pass?

Not immediately. Even after passing the assessment and completing identity verification (via Persona) and billing setup (via Deel), you may sit in a waiting pool for weeks or months. You only start earning when a project matching your specific skills launches and you are officially assigned. Do not plan around Alignerr income until you are actively on a project.

Is there a community?

Yes — and it is one of Alignerr's genuine strengths. Once assigned to a project, you are added to Slack channels where you can ask questions, get rubric clarifications from admins, and talk to other AI trainers. This is rare in AI training and makes a real difference when guidelines are ambiguous or change mid-project.

What does the work actually look like?

It is practical, hands-on data work. You might be recording short videos, categorizing images, rating text responses, or analyzing data. The tasks are designed to be short and distinct—typically 5-60 minutes per task.

How flexible is the schedule?

Extremely. This is true "log in and work" flexibility. You can usually work for 20 minutes or 4 hours depending on your availability. There are rarely minimum hour requirements, making it ideal for side income.

Is there an interview?

Usually, no. Hiring for these roles is almost entirely based on passing an automated assessment or "qualification" task. If you pass the test, you get access to the work.

What is the barrier to entry?

Alignerr is known for difficult technical assessments. You must pass a timed test in your specific domain (e.g., Python, Physics, or Language) before you are eligible for any paid projects.

Senior Software Engineer – AI Evaluation

About this Role

What You'll Do

About the Role

Who You Are

Nice to Have

Why Join Us

Requirements

Compensation Analysis

Skills & Categories

Related Jobs

Computer Science - Domain experts

Atlassian Jira Admin

Machine Learning Engineer (Talent Network)

Backend Engineer

Browse All Jobs from Alignerr

Community Reviews

Leave your review

Frequently Asked Questions

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

No Projects Available?

Fight AI with AI

No Projects Available?

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI