STEM

AI Evaluation Engineer (Python / Java / Web)

Turing • Japan, Remote

Education

Any

Type

hourly

Pay Rate (by country)

$25–$55/hr

Listed

55d ago

✅ Applying through this link supports our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now →

Turing: our referral track record

We've referred 677 candidates to Turing roles. 2% (15) were placed.

Turing: typical time to hire

Based on 15 tracked placements across all Turing roles on our site.

32 days 59 days (median) 85 days

Within 2 weeks

13%

Within 6 weeks

40%

Within 13 weeks

80%

What We Know About This Role

Interview: Required i As stated in the listing: “Duration of contract : 2 months”
Weekly hours: 40 hrs/week
Timezone overlap: Partial overlap with US Pacific time required
Contract length: 8 weeks

About this Role

From the Turing listing

About Turing

Turing is one of the world's fastest-growing AI companies accelerating the advancement and deployment of powerful AI systems.

Turing helps customers in two ways: working with the world's leading AI labs to advance frontier model capabilities and leveraging that work to build real-world AI systems that help businesses solve complex problems and unlock new opportunities.

Role Overview

We are seeking experienced software engineers to join Turing's AI Evaluation team. As an AI Evaluation Engineer, you will design, author, and validate software engineering benchmark tasks that are used to evaluate the capabilities of advanced AI systems across Python, Java/JVM, and Web development environments.

What does day-to-day look like? Design realistic software engineering evaluation tasks for AI agents Write clear, unambiguous instructions that define expected outputs, constraints, and success criteria Create reference solutions that successfully solve the authored tasks Develop verification criteria and automated test descriptions for task validation Author domain-specific skill files that teach workflows, conventions, and best practices without revealing answers Ensure consistency between benchmark variants while maintaining rigorous evaluation standards Review task quality, edge cases, and failure modes to improve benchmark reliability Collaborate with AI researchers, evaluators, and engineering teams to refine benchmark quality Contribute domain expertise in Software Development, Python, Java/JVM, or Web/UI technologies Requirements Bachelor's degree or higher in Computer Science, Software Engineering, or a related technical field 5+ years of hands-on software development experience Strong expertise in at least one of the following domains: Python Development Java/JVM Ecosystem Web Application Development (Frontend, Backend, or Full Stack) Excellent written English and ability to write precise technical instructions Strong understanding of software engineering workflows, debugging, testing, and code quality practices Ability to think critically about how AI systems interpret instructions and solve technical problems Experience working with structured file formats such as JSON, Markdown, YAML, DOCX, or XLSX

Nice to have:

Experience with LLM evaluation, prompt engineering, or AI benchmarking Experience creating technical assessments, coding challenges, or educational content Experience with Docker, containers, or cloud-based development environments Perks of Freelancing With Turing Work on cutting-edge AI projects with leading AI research organizations Flexible remote work opportunities Opportunity to influence the evaluation of next-generation AI systems Collaborate with a global network of highly skilled professionals Offer details: Commitments Required : 40 hours per week with overlap of 4 hours with PST Engagement type : Contractor assignment/freelancer (no medical/paid leave) Duration of contract : 2 months; [expected start date is next week] Evaluation Process One round of technical interview (or) Automated Live coding challenge

Requirements

Python
Java
JavaScript
Must be eligible to work in one of: Japan, Remote

What to Expect

Looking at Turing STEM listings we've tracked, contracts in this domain typically run about 9.2 weeks. Actual length varies by project, but this gives you a realistic baseline going in. This listing's 8-week contract is shorter than the typical 9.2-week Turing STEM engagement.

Based on 19 extracted Turing STEM listings.

Eligible Languages

Fluent proficiency in English

English

Why This Role

No office, no fixed hours, no relocation. This AI Evaluation Engineer (Python / Java / Web) role pays $40/hr fully remote, giving you access to STEM work that would otherwise be limited to a handful of major cities.

Skills & Categories

Explore other opportunities in related specializations:

STEM Java Python English Coding

Related Jobs

Semiconductor Devices & Microelectronics Expert

micro1 • STEM

$130 /task

Mechanical Design / CAD Expert

micro1 • STEM

$130 /task

Electrical & Circuit Design Expert

micro1 • STEM

$130 /task

Mechanical Engineering Subject Matter Expert

micro1 • STEM

$100 /task

Browse All Jobs from Turing

Discover more opportunities on Turing that match your skills and interests.

View All Turing Jobs →

Verified Reviews

Loading reviews…

Community Reviews

Loading reviews…

💬

Share your experience with Turing

Help other candidates make better decisions by leaving a review.

Frequently Asked Questions

Do I need to be a software engineer to work for Turing?

No, not anymore. Turing built its name matching senior engineers with Silicon Valley companies, but it has since expanded into AGI infrastructure work and now hires non-engineering domain experts, technical writers, and researchers for post-training data annotation and RLHF. A strong analytical background and excellent English matter more than coding ability.

How does Turing's talent matching work?

Turing calls it the Intelligent Talent Cloud. You build a profile and go through vetting (automated tests, an AI-powered interview, practical skill assessments), and once vetted, Turing's algorithm surfaces your profile directly to partner companies like Fortune 500s and top AI labs. You don't browse listings or bid on work; matches come to you.

How and when does Turing pay contractors?

Monthly, in USD, via Deel, Payoneer, or direct bank transfer. You're engaged as an independent contractor responsible for your own local taxes. Plan your cash flow around a monthly cycle if you're used to weekly payouts elsewhere.

What does task-based AI training work actually look like?

Practical, hands-on data work: recording short videos, categorizing images, rating text responses, or analyzing data. Tasks are designed to be short and distinct, typically 5 to 60 minutes each.

What does asynchronous AI training work mean in practice?

No set hours, no check-ins, no meetings. You log in when you want, pick up an available task, complete it, and submit; nobody is waiting on you in real time. That's different from remote employment, where you're expected online during business hours. The tradeoff: you're competing with others for available tasks, so an empty queue means there's simply nothing to do until more work is released.

What does STEM work look like for a AI Evaluation Engineer (Python / Java / Web)?

Tasks here are scoped to STEM, not generic labeling. As a AI Evaluation Engineer (Python / Java / Web), expect to draw on real domain judgment (evaluating outputs, correcting errors, or providing expert reasoning specific to STEM) rather than following a one-size-fits-all rubric. If you don't have hands-on STEM background, this is likely not the right listing to start with.

How many hours per week does this role require?

Based on the listing, this role is scoped at about 40 hours per week, over roughly a 8-week contract. Treat this as a real commitment expectation, not a loose estimate.

Do I need to be fluent in English?

Yes. This role specifically requires English proficiency. You will likely be evaluated on written fluency during the assessment, not just conversational level. If English is not your first language or you are not professionally fluent, this is not the right role. Filter for your native language to find better-matched listings.

What happens when I click Apply on this listing?

You'll be taken to Turing's external site to complete your application there. This listing links through a referral, but the process is identical to applying directly; the link just routes you correctly. Create an account on their site and follow their onboarding steps.

Can I apply from outside Japan?

This specific role is restricted to Japan. If you are outside these locations, applying is unlikely to result in an offer even if you pass the assessment. Check the full job description for any VPN or tax-residency caveats.

Is there an interview?

Yes. One round of technical interview (or) Automated Live coding challenge

AI Evaluation Engineer (Python / Java / Web)

What We Know About This Role

About this Role

Requirements

What to Expect

Eligible Languages

Why This Role

Skills & Categories

Related Jobs

Semiconductor Devices & Microelectronics Expert

Mechanical Design / CAD Expert

Electrical & Circuit Design Expert

Mechanical Engineering Subject Matter Expert

Browse All Jobs from Turing

Verified Reviews

Community Reviews

Leave your review

Frequently Asked Questions

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have

Turn Your Expertise Into $78/hr, On Average

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

No Projects Available?

Fight AI with AI

Fight AI with AI

No Projects Available?

Fight AI with AI

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

Turn Your Expertise Into $78/hr, On Average