Location: Bangalore, India
Employment Type: Full-time
Experience: 15+ years
About kAIgentic
kAIgentic is a Singapore-headquartered startup, with presence across Singapore, India and Japan, on a mission to help enterprises evolve as fast as technology by turning their hidden know-how into safe, AI-powered operations. Most large organizations struggle to transform. Their tacit knowledge lives in people’s heads; systems are fragmented, and risk appetite is low. Our platform captures how work happens, designs better workflows, and runs them as governed by agentic operations. The result: organizations that continuously improve, instead of changing in slow, risky bursts.
We’re backed by SMBC Group as our founding partner and “customer zero”, and our platform is already being proven in one of the world’s most complex, regulated environments. That gives us access to real problems, real data, and real impact from Day 1.
The Role
As Principal Engineer on the DevOps/MLOps team, you will be the technical authority for how kAIgentic's enterprise platform is deployed, operated, and evolved at global scale—and you will be hands-on in making that happen. This role is about more than defining infrastructure strategy: you are responsible for the operational philosophy of the organization and for delivering against it. You will shape how reliability, cost efficiency, compliance automation, and AI operational excellence are embedded into every engineering team's workflow, and represent that capability externally to partners, customers, and regulators.
What You'll Do
- Define the multi-year infrastructure and MLOps strategy for kAIgentic's global enterprise platform, encompassing multi-region deployment, AI workload operations, and compliance automation
- Set org-wide engineering culture around reliability: SLO philosophy, chaos engineering practice, incident learning, and sustainable on-call
- Own the hardest infrastructure problems: bank-grade multi-tenant isolation at Kubernetes level, sub-second LLM failover with state preservation, and cost-optimal GPU scheduling for bursty agentic workloads
- Define the MLOps platform strategy covering model lifecycle management, drift detection, and automated retraining pipelines at enterprise scale
- Establish kAIgentic's technical leadership in AI infrastructure operations through community contributions, tooling open-sourcing, and industry engagement
- Lead infrastructure due diligence and deployment planning for strategic banking customer onboarding
- Mentor Staff and Lead Engineers across DevOps, MLOps, and platform engineering
- Optionally lead a small embedded infrastructure crew (2–4 engineers) focused on executing your highest-priority platform programs, with you owning technical direction and the engineering manager owning people management
What You'll Bring
- 15+ years in DevOps, SRE, or platform engineering with substantial experience operating AI/ML infrastructure at scale
- Recognized technical authority in cloud-native infrastructure, MLOps, or AI operations—demonstrated through community leadership, publications, or equivalent impact
- AI-native velocity as a default mode of working (mandatory)
- Expert-level proficiency in Python and Go
- Deep expertise in 5+ of the following:
- Cloud architecture for regulated enterprises at global scale (multi-region, data residency, compliance automation)
- Kubernetes platform engineering including GPU federation, multi-tenant isolation, and cost optimization
- MLOps platform architecture covering model versioning, serving, monitoring, and automated retraining
- LLM operations including multi-provider routing, latency SLOs, cost management, and failover strategies
- Reliability engineering and SRE practices including chaos engineering and systematic incident elimination
- CI/CD and GitOps architecture for complex multi-component systems
- AI observability architecture at the intersection of infrastructure metrics, model behavior, and business KPIs
- Infrastructure security and compliance automation for banking or similarly regulated industries
- Proven track record of infrastructure decisions that enabled business scale without proportional cost growth
- Experience planning and executing multi-region enterprise deployments in regulated environments
- Strong communication skills across engineering, finance, and executive stakeholders
Why join kAIgentic?
We’re a global team of builders who thrive in ambiguity, care deeply about customers, and believe in the power of AI to reshape enterprise work.
We look for people who:
- Bring technical excellence and customer empathy together.
- Are entrepreneurial and excited to work on zero-to-one problems.
- Lead with ownership, integrity, and collaboration.
- want to shape not just a product, but a new category of enterprise AI.
Working here means being surrounded by peers who challenge assumptions, celebrate progress, and build with courage and care.
What It Feels Like to Work at kAIgentic?
Innovation at Scale: Combine startup agility with enterprise-grade challenges.
- Ownership from Day One. Your work directly shapes product, culture, and customer outcomes.
- Learning & Growth – Work with seasoned leaders (ex-Microsoft, AWS, UiPath, Wipro, GE, Genpact) who’ve built and scaled global businesses.
- A culture of trust and psychological safety where experimentation is encouraged.
- Global collaboration across Singapore, India, Japan, Europe, and the US.
- A shared mission to build something the world hasn’t seen before: AI Agents that continuously improve how enterprises run.