Python Insfrastructure Engineer - Model Evaluation
Alignerr β’ Remote β’ Posted 0 days ago
Education
Any
Type
Pay Rate
$62.5/task
Posted
0d ago
β Applying through this link gives you a verified candidate referral.
Referrals from verified candidates give your profile a visibility boost and help support our platform at no cost to you.
This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.
About this Role
What You'll Do
- Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
- Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
- Build and maintain evaluation harnesses that integrate with ML inference frameworks
- Improve reliability, performance, and safety across existing Python codebases
- Instrument systems with observability and metrics collection to monitor reliability and model performance
- Identify bottlenecks and edge cases in data and system behavior, and implement scalable fixes
- Collaborate with data, research, and engineering teams to support model training and evaluation workflows
- Participate in synchronous design reviews to iterate on architecture and implementation decisions
About the Role
What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development. This is a fully remote, flexible contract role working alongside leading AI research labs on real production systems. If you're a strong Python engineer who wants to do meaningful, high-impact work at the frontier of AI β this is the role for you.
- Organization: Alignerr
- Type: Hourly Contract
- Location: Remote
- Commitment: 20β40 hours/week
Who You Are
- Native or fluent English speaker with clear written and verbal communication skills
- Full-stack developer with a strong systems programming background
- 3β5+ years of professional experience writing production-grade Python
- Experienced building evaluation harnesses for ML models and integrating with inference frameworks
- Solid background in observability, metrics collection, and monitoring for production systems
- Self-motivated and reliable β able to commit 20β40 hours per week
Nice to Have
- Prior experience with data annotation, data quality, or evaluation systems
- Familiarity with AI/ML workflows, model training, or benchmarking pipelines
- Experience with distributed systems or developer tooling
- Background in MLOps or AI infrastructure
Why Join Us
- Work directly on cutting-edge AI projects alongside leading research labs
- Fully remote and flexible β structure your work week around your life
- Freelance autonomy with the depth and consistency of meaningful, long-term technical work
- Make a tangible impact on how next-generation AI models are evaluated and improved
- Potential for ongoing work and contract extension as new projects launch
Requirements
- Fluent proficiency in English (Written & Verbal)
- Reliable high-speed internet connection
- Bachelor's degree or equivalent professional experience
- Demonstrated expertise in STEM
Eligible Languages
Fluent proficiency in English
Compensation Analysis
What if your Python expertise could directly shape how the world's most advanced AI models are built, tested, and improved? We're looking for a senior Python engineer to design and build the data pipelines, evaluation harnesses, and annotation tooling that sit at the heart of cutting-edge AI development. This is a fully remote, flexible contract ro
Skills & Categories
Explore other opportunities in related specializations:
Related Jobs
Browse All Jobs from Alignerr
Discover more opportunities on Alignerr that match your skills and interests.
View All Alignerr Jobs βCommunity Reviews
Leave your review
Frequently Asked Questions
What is the assessment actually like?
Notoriously strict. Alignerr uses TestGorilla for role-specific timed tests β a blank coding environment for engineers, rigorous grammar and fact-checking for writers. There is almost no hand-holding. The critical catch: this is essentially a one-shot process. Fail or abandon the assessment, and you are typically locked out of that role permanently with no option to retake.
How quickly can I start earning after I pass?
Not immediately. Even after passing the assessment and completing identity verification (via Persona) and billing setup (via Deel), you may sit in a waiting pool for weeks or months. You only start earning when a project matching your specific skills launches and you are officially assigned. Do not plan around Alignerr income until you are actively on a project.
Is there a community?
Yes β and it is one of Alignerr's genuine strengths. Once assigned to a project, you are added to Slack channels where you can ask questions, get rubric clarifications from admins, and talk to other AI trainers. This is rare in AI training and makes a real difference when guidelines are ambiguous or change mid-project.
Is this just labeling data?
No. This is closer to academic research. You will likely be writing or verifying complex proofs, solving advanced equations, or checking the logic of a model's step-by-step reasoning. The goal is to teach AI systems to reason deeply in your field.
Do I need a PhD?
For the highest pay tiers in this category, a PhD (or current enrollment) is usually expected. However, the most important factor is your ability to pass the domain assessment. If you can solve the problems, the degree is secondary.
Is the work continuous?
Work in niche fields is often project-based. A specific "campaign" (e.g., training a model on Quantum Mechanics) might last for a few weeks. It is best to treat this as a high-paying fellowship or grant rather than a permanent daily job.
What is the barrier to entry?
Alignerr is known for difficult technical assessments. You must pass a timed test in your specific domain (e.g., Python, Physics, or Language) before you are eligible for any paid projects.