Software Engineering turing

Senior Software Engineer – LLM Evaluation & Repository Validation

Turing • Remote, Bangladesh, Egypt, India, Kenya, Mexico, Nigeria, Pakistan, Turkey, Japan

Education

Any

Type

hourly

Pay Rate (by country)

$30–$65/hr

Listed

123d ago

✅ Applying through this link supports our platform at no cost to you.

This position is hosted on an external talent platform. Please only apply for this position if it fits your skills and interests.

Apply Now →

Turing: our referral track record

We've referred 677 candidates to Turing roles. 2% (15) were placed.

Turing: typical time to hire

Based on 15 tracked placements across all Turing roles on our site.

32 days 59 days (median) 85 days

Within 2 weeks

13%

Within 6 weeks

40%

Within 13 weeks

80%

What We Know About This Role

Weekly hours: 20 hrs/week
Timezone overlap: Partial overlap with US Pacific time required
Contract length: 12 weeks

About this Role

From the Turing listing

About Turing: Based in San Francisco, California, Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises deploying advanced AI systems. Turing supports customers in two ways: first, by accelerating frontier research with high-quality data, advanced training pipelines, plus top AI researchers who specialize in coding, reasoning, STEM, multilinguality, multimodality, and agents; and second, by applying that expertise to help enterprises transform AI from proof of concept into proprietary intelligence with systems that perform reliably, deliver measurable impact, and drive lasting results on the P&L.

Role Overview:

As an AI Quality Analyst, you will evaluate a new personalization feature for Gemini. You will assess how well the model uses information from your past Gemini conversations, Gmail, Google Search, and YouTube activity to make responses more relevant and helpful. This role requires a unique blend of creativity and analytical rigor. You will actively design prompts from the perspective of your own personal experiences. You will then use your analytical skills to assess the quality of the model's personalized responses, evaluating dimensions like Grounding, Integration, and Helpfulness.

Key Qualifications

Japanese Proficiency: Ability to read and write in Japanese with a high degree of comp, as Japanese is the focus language for this project. Personal Account Usage: Willingness to use your primary personal Google account (not a testing account) and enable personal data sources for a genuine assessment. Schedule Flexibility: Full-time availability in your local time zone is required. We are staffing a global, 24-hour operations team. Exceptional Analytical Thinking: Demonstrate ability to evaluate nuanced and ambiguous AI responses, specifically assessing personalization quality. Creative Prompt Engineering: Experience in designing creative, multi-turn starting prompts based on personal context to thoroughly test the model's capabilities. Strong Evaluation Acumen: Understanding of personalization concepts, including the ability to identify incorrect personalization, poor inferences, and forced connections. Meticulous Attention to Detail: The ability to review Side-by-Side (SxS) model responses and spot subtle differences in naturalness and overnarrating. Excellent Written Communication: Superior ability to write clear, concise, and structured rationales for model rankings, explicitly referencing specific turn numbers. Feedback: Ability to provide constructive feedback and detailed annotations. Communication: Excellent communication and collaboration skills. Independence: Self-motivated and able to work independently in a remote setting. Technical Setup: Desktop/Laptop set up with a good internet connection.

Description:

In this role, you will be part of a dynamic team focused on evaluating the quality of personalized AI interactions. Your day-to-day work will involve: Designing and executing multi-turn conversational prompts (typically 1-5 turns) that require the AI to utilize your personal information and experiences. Evaluating model responses based on your intent from the starting prompt, checking if the personalization was appropriately applied. Analyzing responses for Grounding issues, ensuring claims about you are supported by evidence and not flawed inferences or hallucinations. Assessing Integration quality to ensure personal data is woven naturally into the response without robotic "overnarrating". Rigorously evaluating and stack-ranking two model responses side-by-side (SxS) to determine which is overall more helpful, easy to use, and enjoyable. Writing clear, defensible rationales for your comparisons, explicitly referencing where issues or positive aspects occurred in the conversation. Extracting and verifying "Debug Info" from the model to confirm that chat summaries and data sources were properly utilized. Maintaining strict data hygiene by deleting evaluation conversations to prevent them from polluting your future chat history.

Education & Experience

BS/BA degree or equivalent experience in a relevant field (e.g., Policy, Law, Ethics, Linguistics, Journalism, Computer Science, or a related analytical field). Experience in data annotation, AI quality evaluation, content moderation, or a related role is strongly preferred.

Offer Details:

Commitments Required: at least 4 hours per day and up to 20 hours per week with 4 hours of overlap with PST. Engagement type: Contractor Engagement Length: 3 months Our offered rate for this project is $15 per hour.

Evaluation Process -

Shortlisted candidates will be sent a Job Interest Form. After the profile review, an assessment will be shared, which must be completed within 24 hours. Based on the assessment outcomes, shortlisted candidates will be contacted to discuss the pre‑onboarding requirements.

Requirements

Domain-Specific Languages
Must be eligible to work in one of: Remote, Bangladesh

What to Expect

Looking at Turing Software Engineering listings we've tracked, contracts in this domain typically run about 9.8 weeks. Actual length varies by project, but this gives you a realistic baseline going in. This listing's 12-week contract is longer than the typical 9.8-week Turing Software Engineering engagement.

Based on 50 extracted Turing Software Engineering listings.

Eligible Languages

Fluent proficiency in Japanese

Japanese

Why This Role

No office, no fixed hours, no relocation. This Senior Software Engineer – LLM Evaluation & Repository Validation role pays $47/hr fully remote, giving you access to Software Engineering work that would otherwise be limited to a handful of major cities.

Skills & Categories

Explore other opportunities in related specializations:

Software Engineering Japanese Coding

Related Jobs

Software Engineering - Research & Evaluation Studies

terac • Software Engineering

$250 /task

Engineering & Data tools Specialist

micro1 • Software Engineering

$80 /hr

QA / Software Engineering Reviewer – Browser Test Validation

mercor • Software Engineering

$60 /hr

Head of AI & Engineering Expert

ethos • Software Engineering

$150 /hr

Browse All Jobs from Turing

Discover more opportunities on Turing that match your skills and interests.

View All Turing Jobs →

Verified Reviews

Loading reviews…

Community Reviews

Loading reviews…

💬

Share your experience with Turing

Help other candidates make better decisions by leaving a review.

Frequently Asked Questions

Do I need to be a software engineer to work for Turing?

No, not anymore. Turing built its name matching senior engineers with Silicon Valley companies, but it has since expanded into AGI infrastructure work and now hires non-engineering domain experts, technical writers, and researchers for post-training data annotation and RLHF. A strong analytical background and excellent English matter more than coding ability.

How does Turing's talent matching work?

Turing calls it the Intelligent Talent Cloud. You build a profile and go through vetting (automated tests, an AI-powered interview, practical skill assessments), and once vetted, Turing's algorithm surfaces your profile directly to partner companies like Fortune 500s and top AI labs. You don't browse listings or bid on work; matches come to you.

How and when does Turing pay contractors?

Monthly, in USD, via Deel, Payoneer, or direct bank transfer. You're engaged as an independent contractor responsible for your own local taxes. Plan your cash flow around a monthly cycle if you're used to weekly payouts elsewhere.

What does task-based AI training work actually look like?

Practical, hands-on data work: recording short videos, categorizing images, rating text responses, or analyzing data. Tasks are designed to be short and distinct, typically 5 to 60 minutes each.

What does asynchronous AI training work mean in practice?

No set hours, no check-ins, no meetings. You log in when you want, pick up an available task, complete it, and submit; nobody is waiting on you in real time. That's different from remote employment, where you're expected online during business hours. The tradeoff: you're competing with others for available tasks, so an empty queue means there's simply nothing to do until more work is released.

What does Software Engineering work look like for a Senior Software Engineer – LLM Evaluation & Repository Validation?

Tasks here are scoped to Software Engineering, not generic labeling. As a Senior Software Engineer – LLM Evaluation & Repository Validation, expect to draw on real domain judgment (evaluating outputs, correcting errors, or providing expert reasoning specific to Software Engineering) rather than following a one-size-fits-all rubric. If you don't have hands-on Software Engineering background, this is likely not the right listing to start with.

How many hours per week does this role require?

Based on the listing, this role is scoped at about 20 hours per week, over roughly a 12-week contract. Treat this as a real commitment expectation, not a loose estimate.

Do I need to be fluent in Japanese?

Yes. This role specifically requires Japanese proficiency. You will likely be evaluated on written fluency during the assessment, not just conversational level. If Japanese is not your first language or you are not professionally fluent, this is not the right role. Filter for your native language to find better-matched listings.

What happens when I click Apply on this listing?

You'll be taken to Turing's external site to complete your application there. This listing links through a referral, but the process is identical to applying directly; the link just routes you correctly. Create an account on their site and follow their onboarding steps.

Can I apply from outside Bangladesh, Egypt, India and 6 other countries?

This specific role is open only to people based in Bangladesh, Egypt, India, Kenya, Mexico, Nigeria, Pakistan, Turkey, and Japan. If you are somewhere else, applying is unlikely to lead to an offer even if you pass the assessment, because the restriction is usually about where the work can legally be contracted rather than your skills. Read the full description for any tax-residency or right-to-work caveats before you apply, since they can differ by country.

Senior Software Engineer – LLM Evaluation & Repository Validation

What We Know About This Role

About this Role

Requirements

What to Expect

Eligible Languages

Why This Role

Skills & Categories

Related Jobs

Software Engineering - Research & Evaluation Studies

Engineering & Data tools Specialist

QA / Software Engineering Reviewer – Browser Test Validation

Head of AI & Engineering Expert

Browse All Jobs from Turing

Verified Reviews

Community Reviews

Leave your review

Frequently Asked Questions

$150–$225/hr. Lawyers, MDs and Finance Experts Wanted.

Get Paid for the Expertise You Already Have

Turn Your Expertise Into $78/hr, On Average

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

No Projects Available?

Fight AI with AI

Fight AI with AI

No Projects Available?

Fight AI with AI

AI Trainer? Don't Let the IRS Keep Your Bonus

Fight AI with AI

Turn Your Expertise Into $78/hr, On Average