Tactical Edge
Back to Careers

Engineering Manager, Agent Prompts & Evals

Lead the team responsible for prompt engineering, model evaluation, and AI quality assurance. Own the systems that ensure our AI agents produce reliable, accurate, and safe outputs.

RemoteEngineeringFull-time

Role Overview

We're looking for a hands-on engineering manager to lead the team responsible for prompt engineering, model evaluation, and AI quality assurance across all Tactical Edge products. You'll own the systems that ensure our AI agents produce reliable, accurate, and safe outputs.

This is a player-coach role — you set the technical direction, build the evaluation infrastructure, and hold the quality bar while growing a team of specialists.

Key Responsibilities

Team Leadership

  • Manage and grow a team of prompt engineers and eval specialists.
  • Set standards for prompt design, evaluation methodology, and quality metrics across all products.
  • Prompt Engineering at Scale

  • Build and maintain prompt libraries, templates, and versioning systems.
  • Establish best practices for system prompts, few-shot examples, chain-of-thought reasoning, and tool-use instructions.
  • Evaluation Systems

  • Design and operate eval pipelines that measure accuracy, safety, hallucination rates, and task completion across all AI agents.
  • Build automated benchmarks and regression suites.
  • Model Selection & Optimization

  • Evaluate new models (Claude, GPT, Llama, Qwen, Mistral) for fitness across different use cases.
  • Run A/B tests, cost-quality tradeoff analysis, and latency benchmarks.
  • Quality & Safety

  • Own the quality bar for AI outputs.
  • Implement red-teaming, adversarial testing, and safety evaluations.
  • Build dashboards for monitoring AI quality in production.
  • Cross-functional Collaboration

  • Work with product, engineering, and customer teams to translate requirements into evaluation criteria and prompt strategies.
  • Key Traits (Non-Negotiable)

  • Technical manager who still writes code and reviews prompts hands-on.
  • Deep understanding of LLM behavior, failure modes, and edge cases.
  • Data-driven decision maker who builds systems to measure what matters.
  • Strong communicator who can translate AI quality concepts for non-technical stakeholders.
  • High bar for quality with a pragmatic approach to shipping.
  • Preferred Qualifications

  • 5+ years engineering experience with 2+ years managing teams.
  • Deep experience with LLM prompting, evaluation, and optimization.
  • Familiarity with eval frameworks (Braintrust, Langsmith, or custom).
  • Production AI systems experience with observability and monitoring.
  • Understanding of model architectures, tokenization, and inference optimization.
  • How We Work

    Outcome-driven

    Production over demos

    Enterprise-first

    Security, governance, scalability

    Agentic by design

    Systems that reason and act safely

    Small teams, high ownership

    Autonomy with accountability

    What You'll Get

    • Work on real, production AI deployments
    • Enterprise-scale challenges and measurable impact
    • Cross-functional collaboration and high ownership
    • Competitive compensation (role/location dependent)
    • Flexible work setup where applicable

    Hiring Process

    1

    Intro call

    Fit + context

    2

    Technical discussion

    Prompt engineering, eval design, model selection

    3

    Leadership & systems interview

    Team management, cross-functional collaboration

    4

    Final conversation

    Alignment + next steps

    We value clarity, ownership, and thoughtful execution over buzzwords.