Software Engineering alignerr

Senior Software Engineer — AI Evaluation & Benchmarks

Alignerr • Remote

Education

Any

Type

hourly

Pay Rate (by country)

$80–$100/hr

Listed

72d ago

✅ Applying through this link supports our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now →

What We Know About This Role

Contract length: 13 weeks

About this Role

From the Alignerr listing

What You'll Do

Design and implement coding benchmarks used to evaluate frontier AI models across real-world programming tasks
Build and maintain scalable data pipelines for AI evaluation workflows
Analyze AI-generated code for correctness, reliability, and edge-case failures
Create structured evaluation scenarios that rigorously test reasoning, debugging, and code quality
Work with large code repositories and multi-language environments
Collaborate on systems that improve how AI models understand and generate software
Provide detailed technical feedback on model performance and failure patterns
Contribute to the design of evaluation frameworks that set industry standards

About the Role

What if the code you write could determine how smart the next generation of AI truly is? We're looking for experienced Software Engineers to design and build the coding benchmarks and data pipelines used to evaluate frontier AI models — the systems that decide whether an AI can actually reason, debug, and write production-quality software. This is high-impact, technically demanding work at the intersection of software engineering and AI research. You'll work with large codebases, multiple programming languages, and scalable infrastructure to create evaluation systems that push the boundaries of what AI can do. This is a fully remote contract role. If you thrive in fast-paced engineering environments and want your work to directly shape the trajectory of AI — this is the role.

Organization: Alignerr
Type: Hourly Contract
Location: Remote
Contract Length: 3 Months
Commitment: Full-time availability preferred

Who You Are

4+ years of professional software engineering experience — this is non-negotiable
Experience working at a high-growth tech company or top-tier software organization
Expert proficiency in Python — you write clean, performant, well-tested Python code
Hands-on experience with code repositories and working in large, complex codebases
Proven experience designing and implementing LLM coding benchmarks and data pipelines
Track record of working in high-performance engineering environments with large-scale products or platforms
Strong command of version control systems (Git) and modern development workflows
Bilingual or native English speaker with strong written communication skills
Self-directed, technically rigorous, and comfortable operating with autonomy

What Makes a Perfect Match

Candidates with these additional qualifications have the highest chance of success:

Senior or Lead-level engineering profiles with a history of technical ownership
Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field — or equivalent professional experience
Proficiency in one or more additional languages: JavaScript, Go, C++, or other relevant languages
Experience with CI/CD pipelines and writing robust unit tests (pytest, Mocha, JUnit)
Background in security engineering or significant open-source contributions
Familiarity with AI/ML evaluation methodologies or model benchmarking

Why Join Us

Work on cutting-edge AI evaluation projects alongside world-class research teams
Fully remote — work from anywhere with a reliable internet connection
Your benchmarks directly influence how the most advanced AI systems in the world are measured and improved
Freelance autonomy with meaningful, high-stakes engineering work
Collaborate with a global community of elite engineers and researchers
Potential for contract extension and ongoing engagement as new evaluation challenges emerge

Requirements

Fluent proficiency in English (Written & Verbal)
Reliable high-speed internet connection
Bachelor's degree or equivalent professional experience
Demonstrated expertise in Software Engineering

Eligible Languages

Fluent proficiency in English

English

Why This Role

Skills & Categories

Explore other opportunities in related specializations:

Software Engineering JavaScript Python English Coding Bilingual

Related Jobs

Engineering & Data tools Specialist

micro1 • Software Engineering

$80 /hr

QA / Software Engineering Reviewer – Browser Test Validation

mercor • Software Engineering

$60 /hr

Head of AI & Engineering Expert

ethos • Software Engineering

$150 /hr

Senior Machine Learning Engineer / Model Evaluations Expert

ethos • Software Engineering

$125 /hr

Browse All Jobs from Alignerr

Discover more opportunities on Alignerr that match your skills and interests.

View All Alignerr Jobs →

Verified Reviews

Loading reviews…

Community Reviews

Loading reviews…

💬

Share your experience with Alignerr

Help other candidates make better decisions by leaving a review.

Frequently Asked Questions

How hard is the Alignerr assessment?

Hard, and unforgiving. Alignerr uses TestGorilla for timed, role-specific tests: a blank coding environment for engineers, strict grammar and fact-checking for writers. Treat it as one shot. Failing or abandoning it typically locks you out of that role permanently, with no retake.

How soon can I start earning on Alignerr after passing the assessment?

Not right away. After passing, you still complete identity verification through Persona and billing setup through Deel, then wait in a pool for weeks or months. You only start earning once a project matching your specific skills launches and assigns you. Don't count on Alignerr income until you're actively placed on a project.

Does Alignerr have a trainer community?

Yes, and it's a genuine strength. Once you're assigned to a project, you join Slack channels where you can get rubric clarifications from admins and talk to other trainers. That kind of support is rare in AI training and matters most when guidelines are ambiguous or shift mid-project.

What does task-based AI training work actually look like?

Practical, hands-on data work: recording short videos, categorizing images, rating text responses, or analyzing data. Tasks are designed to be short and distinct, typically 5 to 60 minutes each.

What does asynchronous AI training work mean in practice?

No set hours, no check-ins, no meetings. You log in when you want, pick up an available task, complete it, and submit; nobody is waiting on you in real time. That's different from remote employment, where you're expected online during business hours. The tradeoff: you're competing with others for available tasks, so an empty queue means there's simply nothing to do until more work is released.

What does Software Engineering work look like for a Senior Software Engineer — AI Evaluation & Benchmarks?

Tasks here are scoped to Software Engineering, not generic labeling. As a Senior Software Engineer — AI Evaluation & Benchmarks, expect to draw on real domain judgment (evaluating outputs, correcting errors, or providing expert reasoning specific to Software Engineering) rather than following a one-size-fits-all rubric. If you don't have hands-on Software Engineering background, this is likely not the right listing to start with.

Do I need to be fluent in English?

Yes. This role specifically requires English proficiency. You will likely be evaluated on written fluency during the assessment, not just conversational level. If English is not your first language or you are not professionally fluent, this is not the right role. Filter for your native language to find better-matched listings.

What happens when I click Apply on this listing?

You'll be taken to Alignerr's external site to complete your application there. This listing links through a referral, but the process is identical to applying directly; the link just routes you correctly. Create an account on their site and follow their onboarding steps.

What is the barrier to entry for Alignerr?

A difficult, timed technical assessment in your specific domain, like Python, physics, or language. Passing it is required before you're eligible for any paid projects.

Senior Software Engineer — AI Evaluation & Benchmarks

What We Know About This Role

About this Role

What You'll Do

About the Role

Who You Are

What Makes a Perfect Match

Why Join Us

Requirements

Eligible Languages

Why This Role

Skills & Categories

Related Jobs

Engineering & Data tools Specialist

QA / Software Engineering Reviewer – Browser Test Validation

Head of AI & Engineering Expert

Senior Machine Learning Engineer / Model Evaluations Expert

Browse All Jobs from Alignerr

Verified Reviews

Community Reviews

Leave your review

Frequently Asked Questions

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have

Turn Your Expertise Into $78/hr, On Average

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

No Projects Available?

Fight AI with AI

Fight AI with AI

No Projects Available?

Fight AI with AI

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

Turn Your Expertise Into $78/hr, On Average