aitrainer.work - AI Training Jobs Platform
Software Engineering alignerr

Senior Software Engineer – AI Evaluation

Alignerr Remote Posted 9 days ago

Education

Any

Type

Pay Rate

$/task

Posted

9d ago

✅ Applying through this link gives you a verified candidate referral.

Referrals from verified candidates give your profile a visibility boost and help support our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now

About this Role

What You'll Do

  • Design and build scalable evaluation pipelines and frameworks for assessing AI model performance across diverse tasks and domains
  • Develop automated testing harnesses, scoring systems, and benchmarking tools for large language models and other AI systems
  • Write clean, production-quality code to process, analyze, and visualize evaluation datasets at scale
  • Create and maintain APIs, dashboards, and internal tools that enable research teams to run, track, and compare evaluations efficiently
  • Collaborate with AI researchers and data scientists to translate evaluation methodologies into reliable, repeatable software
  • Identify edge cases, failure modes, and reliability issues in AI outputs through systematic engineering approaches
  • Optimize system performance, data processing speed, and infrastructure costs
  • Contribute to the architecture and technical direction of the evaluation platform
  • Write clear documentation and participate in code reviews to maintain high engineering standards

About the Role

What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust.

This is high-impact, technically challenging work at the intersection of software engineering and AI research. You'll build the tools, pipelines, and frameworks that help leading research teams understand what their models can do, where they fail, and how to make them better. If you love building robust systems and care deeply about quality and measurement, this role puts you at the center of the AI revolution.

  • Organization: Alignerr
  • Type: Hourly Contract
  • Location: Remote
  • Commitment: 20–40 hours/week

Who You Are

  • 5+ years of professional software engineering experience, with a track record of building and shipping production systems
  • Strong proficiency in Python — including experience with data processing libraries (pandas, NumPy) and web frameworks (FastAPI, Flask, or Django)
  • Solid understanding of software architecture, design patterns, and engineering best practices
  • Experience working with large datasets and building data pipelines
  • Comfortable with cloud infrastructure (AWS, GCP, or Azure) and containerized deployments
  • Familiarity with version control (Git), CI/CD workflows, and testing frameworks
  • Strong problem-solving skills and the ability to work through ambiguity independently
  • Excellent written communication skills — you can document your work clearly and collaborate asynchronously
  • Self-motivated and reliable when working independently in a remote environment

Nice to Have

  • Experience with ML/AI evaluation, benchmarking, or model testing
  • Familiarity with LLMs, prompt engineering, or AI safety and alignment concepts
  • Background in building developer tools, internal platforms, or data infrastructure
  • Experience with distributed systems, message queues, or workflow orchestration (Airflow, Prefect, etc.)
  • Knowledge of statistical methods for measuring and comparing model performance
  • Prior experience in a remote-first or async-first engineering culture
  • Contributions to open-source projects related to AI, ML, or evaluation tooling

Why Join Us

  • Work on cutting-edge AI evaluation projects alongside world-class research labs
  • Directly influence how AI quality and safety are measured at scale — your code shapes the standard
  • Fully remote and flexible — work when and where you're most productive
  • Freelance autonomy with access to deeply meaningful, technically stimulating work
  • Collaborate with a global team of engineers and researchers pushing the boundaries of AI
  • Exposure to the latest developments in AI research, model capabilities, and evaluation science
  • Potential for ongoing work and contract extension as the platform and project scope grow

Requirements

  • Fluent proficiency in English (Written & Verbal)
  • Reliable high-speed internet connection

Compensation Analysis

What if your engineering skills could directly determine whether the world's most advanced AI systems are actually working? We're looking for Senior Software Engineers to design, build, and scale the evaluation infrastructure that measures AI performance — the critical layer between raw model output and real-world trust. This is high-impact, techni

Skills & Categories

Explore other opportunities in related specializations:

Related Jobs

Alignerr

Browse All Jobs from Alignerr

Discover more opportunities on Alignerr that match your skills and interests.

View All Alignerr Jobs →

Community Reviews

Loading reviews…
💬

Share your experience with Alignerr

Help other candidates make better decisions by leaving a review.

Sign in to leave a review

Frequently Asked Questions

What is the assessment actually like?

Notoriously strict. Alignerr uses TestGorilla for role-specific timed tests — a blank coding environment for engineers, rigorous grammar and fact-checking for writers. There is almost no hand-holding. The critical catch: this is essentially a one-shot process. Fail or abandon the assessment, and you are typically locked out of that role permanently with no option to retake.

How quickly can I start earning after I pass?

Not immediately. Even after passing the assessment and completing identity verification (via Persona) and billing setup (via Deel), you may sit in a waiting pool for weeks or months. You only start earning when a project matching your specific skills launches and you are officially assigned. Do not plan around Alignerr income until you are actively on a project.

Is there a community?

Yes — and it is one of Alignerr's genuine strengths. Once assigned to a project, you are added to Slack channels where you can ask questions, get rubric clarifications from admins, and talk to other AI trainers. This is rare in AI training and makes a real difference when guidelines are ambiguous or change mid-project.

What does the work actually look like?

It is practical, hands-on data work. You might be recording short videos, categorizing images, rating text responses, or analyzing data. The tasks are designed to be short and distinct—typically 5-60 minutes per task.

How flexible is the schedule?

Extremely. This is true "log in and work" flexibility. You can usually work for 20 minutes or 4 hours depending on your availability. There are rarely minimum hour requirements, making it ideal for side income.

Is there an interview?

Usually, no. Hiring for these roles is almost entirely based on passing an automated assessment or "qualification" task. If you pass the test, you get access to the work.

What is the barrier to entry?

Alignerr is known for difficult technical assessments. You must pass a timed test in your specific domain (e.g., Python, Physics, or Language) before you are eligible for any paid projects.