The Tech Lead — AI Platform is the senior technical leader for the platform runtime, AI engine, and agent-orchestration tier of an AI-native retail decisioning platform. The role is accountable for architectural integrity, build vs buy decisions, integration with upstream data and knowledge-graph services, and the agent runtime's safety and scale properties — and leads a team of senior software, AI, and ML engineers.
Remote candidates outside of Thailand are welcome to apply.
Key Responsibilities
- Own the reference architecture for the platform's runtime, decisioning, and agent-execution tiers; co-chair the Architecture Review Board; author Architecture Decision Records.
- Design, build, and operate the agent runtime (LangGraph, CrewAI, or chosen framework) — deployment, scale, observability, cost per invocation.
- Design and ship an agent autonomy framework with progressive trust levels (shadow / recommender / executor patterns) and measurable gate criteria; operationalise human-in-the-loop patterns for every agent.
- Own the agent registry — catalogued, versioned, owned, gated, monitored agents.
- Define and operationalise the consumption contract with upstream knowledge-graph, semantic-layer, data-product, and event-stream services from the platform team.
- Lead the AI-side decisioning components — orchestrator, trust gate service, agent-side helpers — and coordinate consumption of LLM Gateway and Vector Search services.
- Lead a team of senior software, AI, and ML engineers; mentor on agent-engineering discipline; partner with peer Tech Leads on handoffs into application and experience layers.
- Own runtime SLOs — invocation P95, success rate, HITL response time — and per-agent cost meter; lead incident response for runtime degradation.
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or a related discipline.
- 8+ years software engineering with 3+ years in a Tech Lead / Staff role owning platform standards.
- Production agentic systems experience — multi-agent orchestration, HITL gates, eval-driven CI; not just RAG demos.
- Strong distributed-systems fundamentals — concurrency, message queues, observability, performance.
- LLM platform depth — at least one major provider (Azure OpenAI, Anthropic, Bedrock, Vertex) in production with cost / latency optimisation.
- API-first design discipline — service contracts, SLOs, versioning, deprecation policies.
- Cloud platform experience (Azure preferred; AWS / GCP transferable).
- Architectural authorship — has written ADRs, chaired ARB, made build-vs-buy calls with executive sponsors.
Preferred Qualifications
- Built or led a production multi-agent platform serving multiple business consumers.
- Open-source agent framework contributions (LangChain / LangGraph / AutoGen / DSPy).
- Retail / commerce / fintech domain experience; knowledge-graph production experience (Neo4j, Neptune, TigerGraph).
- Causal inference exposure (DoWhy / EconML).