Engineering Manager, Agent Prompts & Evals

Lead the team responsible for prompt engineering, model evaluation, and AI quality assurance. Own the systems that ensure our AI agents produce reliable, accurate, and safe outputs.

RemoteEngineeringFull-time

Role Overview

We're looking for a hands-on engineering manager to lead the team responsible for prompt engineering, model evaluation, and AI quality assurance across all Tactical Edge products. You'll own the systems that ensure our AI agents produce reliable, accurate, and safe outputs.

This is a player-coach role — you set the technical direction, build the evaluation infrastructure, and hold the quality bar while growing a team of specialists.

Key Responsibilities

Team Leadership

Manage and grow a team of prompt engineers and eval specialists.

Set standards for prompt design, evaluation methodology, and quality metrics across all products.

Prompt Engineering at Scale

Build and maintain prompt libraries, templates, and versioning systems.

Establish best practices for system prompts, few-shot examples, chain-of-thought reasoning, and tool-use instructions.

Evaluation Systems

Design and operate eval pipelines that measure accuracy, safety, hallucination rates, and task completion across all AI agents.

Build automated benchmarks and regression suites.

Model Selection & Optimization

Evaluate new models (Claude, GPT, Llama, Qwen, Mistral) for fitness across different use cases.

Run A/B tests, cost-quality tradeoff analysis, and latency benchmarks.

Quality & Safety

Own the quality bar for AI outputs.

Implement red-teaming, adversarial testing, and safety evaluations.

Build dashboards for monitoring AI quality in production.

Cross-functional Collaboration

Work with product, engineering, and customer teams to translate requirements into evaluation criteria and prompt strategies.

Key Traits (Non-Negotiable)

Technical manager who still writes code and reviews prompts hands-on.

Deep understanding of LLM behavior, failure modes, and edge cases.

Data-driven decision maker who builds systems to measure what matters.

Strong communicator who can translate AI quality concepts for non-technical stakeholders.

High bar for quality with a pragmatic approach to shipping.

Preferred Qualifications

5+ years engineering experience with 2+ years managing teams.

Deep experience with LLM prompting, evaluation, and optimization.

Familiarity with eval frameworks (Braintrust, Langsmith, or custom).

Production AI systems experience with observability and monitoring.

Understanding of model architectures, tokenization, and inference optimization.

How We Work

Outcome-driven

Production over demos

Enterprise-first

Security, governance, scalability

Agentic by design

Systems that reason and act safely

Small teams, high ownership

Autonomy with accountability

What You'll Get

Work on real, production AI deployments
Enterprise-scale challenges and measurable impact
Cross-functional collaboration and high ownership
Competitive compensation (role/location dependent)
Flexible work setup where applicable

Hiring Process

Intro call

Fit + context

Technical discussion

Prompt engineering, eval design, model selection

Leadership & systems interview

Team management, cross-functional collaboration

Final conversation

Alignment + next steps

We value clarity, ownership, and thoughtful execution over buzzwords.

Apply for this role

View all roles