aitrainer.work - AI Training Jobs Platform
STEM

AI Evaluation Engineer (Python / Java / Web)

Turing Japan, Remote Posted 5 days ago

Education

Any

Type

Pay Rate

$40/task

Posted

5d ago

✅ Applying through this link gives you a verified candidate referral.

Referrals from verified candidates give your profile a visibility boost and help support our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now

Applying to Turing?

We support strong candidates applying here. Set up your talent profile so we know who you are.

Set up your profile →

About this Role

About Turing

Turing is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.

Turing helps customers in two ways: working with the world's leading AI labs to advance frontier model capabilities and leveraging that work to build real-world AI systems that help businesses solve complex problems and unlock new opportunities.

Role Overview

We are seeking experienced software engineers to join Turing's AI Evaluation team. As an AI Evaluation Engineer, you will design, author, and validate software engineering benchmark tasks that are used to evaluate the capabilities of advanced AI systems across Python, Java/JVM, and Web development environments.

What does day-to-day look like? Design realistic software engineering evaluation tasks for AI agents Write clear, unambiguous instructions that define expected outputs, constraints, and success criteria Create reference solutions that successfully solve the authored tasks Develop verification criteria and automated test descriptions for task validation Author domain-specific skill files that teach workflows, conventions, and best practices without revealing answers Ensure consistency between benchmark variants while maintaining rigorous evaluation standards Review task quality, edge cases, and failure modes to improve benchmark reliability Collaborate with AI researchers, evaluators, and engineering teams to refine benchmark quality Contribute domain expertise in Software Development, Python, Java/JVM, or Web/UI technologies Requirements Bachelor's degree or higher in Computer Science, Software Engineering, or a related technical field 5+ years of hands-on software development experience Strong expertise in at least one of the following domains: Python Development Java/JVM Ecosystem Web Application Development (Frontend, Backend, or Full Stack) Excellent written English and ability to write precise technical instructions Strong understanding of software engineering workflows, debugging, testing, and code quality practices Ability to think critically about how AI systems interpret instructions and solve technical problems Experience working with structured file formats such as JSON, Markdown, YAML, DOCX, or XLSX

Nice to have:

Experience with LLM evaluation, prompt engineering, or AI benchmarking Experience creating technical assessments, coding challenges, or educational content Experience with Docker, containers, or cloud-based development environments Perks of Freelancing With Turing Work on cutting-edge AI projects with leading AI research organizations Flexible remote work opportunities Opportunity to influence the evaluation of next-generation AI systems Collaborate with a global network of highly skilled professionals Offer details: Commitments Required : 40 hours per week with overlap of 4 hours with PST Engagement type : Contractor assignment/freelancer (no medical/paid leave) Duration of contract : 2 months; [expected start date is next week] Evaluation Process One round of technical interview (or) Automated Live coding challenge

Requirements

  • Must be eligible to work in one of: Japan, Remote
  • Fluent proficiency in English (Written & Verbal)
  • Reliable high-speed internet connection

Eligible Languages

Fluent proficiency in English

English

Compensation Analysis

Work from anywhere, at any time. This fully remote position ($40/hr) breaks down geographic barriers, allowing you to earn US-competitive rates regardless of your local market. It is a perfect stepping stone for building a career in the data labeling and AI training ecosystem.

Skills & Categories

Explore other opportunities in related specializations:

Related Jobs

Browse All Jobs from Turing

Discover more opportunities on Turing that match your skills and interests.

View All Turing Jobs →

Community Reviews

Loading reviews…
💬

Share your experience with Turing

Help other candidates make better decisions by leaving a review.

Sign in to leave a review

Frequently Asked Questions

Do I need to be a software engineer?

Not anymore. Turing built its reputation matching senior engineers with Silicon Valley companies, but they have heavily pivoted into AGI infrastructure. They now hire non-engineering domain experts, technical writers, and researchers for post-training data annotation and RLHF. A strong analytical background and excellent English are required, but you do not need to code.

How does matching work?

Turing calls it the 'Intelligent Talent Cloud.' You build a profile and go through deep vetting — automated tests, an AI-powered interview, and practical skill assessments. Once vetted, Turing's algorithm automatically surfaces you to partner companies (Fortune 500s and top AI labs). You don't browse job boards or bid on work — matches come to you.

How does payment work?

You are hired as an independent contractor, responsible for your own local taxes. Turing collects payment from the client and pays you monthly in USD via Deel, Payoneer, or direct bank/wire transfer. Monthly pay is standard for long-term contract roles — if you need weekly cash flow, this structure requires adjustment.

What does the work actually look like?

It is practical, hands-on data work. You might be recording short videos, categorizing images, rating text responses, or analyzing data. The tasks are designed to be short and distinct—typically 5-60 minutes per task.

How flexible is the schedule?

Extremely. This is true "log in and work" flexibility. You can usually work for 20 minutes or 4 hours depending on your availability. There are rarely minimum hour requirements, making it ideal for side income.

Is there an interview?

Usually, no. Hiring for these roles is almost entirely based on passing an automated assessment or "qualification" task. If you pass the test, you get access to the work.