Skip to Content

Dev Ops/AI Specialist


Join our team!

We are looking for an AI Infrastructure Engineer who will build, operate, and maintain OpenShift‑based AI environments, manage Kubernetes orchestration, support containerized workloads, and ensure reliable Linux infrastructure performance for on‑prem AI clusters. The role requires strong skills in Docker, Git workflows, GPU‑aware infrastructure, and monitoring tools while enabling scalable, secure, and production‑ready AI services.


What is your mission?

You will provide the best service to our partner brands by performing these tasks:

  • Builds, operates, and maintains the AI platform environment as it transitions toward Kubernetes on OpenShift as the primary platform direction.
  • Deploys and manages workloads on OpenShift supporting on‑prem AI clusters and customer‑facing AI infrastructure capabilities.
  • Owns cluster reliability to ensure OpenShift‑based services are stable, production‑ready, and consistently available.
  • Supports platform evolution, absorbs enhancements, and contributes to cloud‑native integrations built on OpenShift.
  • Operates Kubernetes orchestration for AI services, ensuring workloads are properly deployed, scaled, and managed across the cluster.
  • Maintains infrastructure layers that support AI inference workloads, with strong awareness of how the orchestration layer interacts with monitoring and other platform components.
  • Participates in modernization work as the platform expands from current Linux operations into a full OpenShift/Kubernetes environment.
  • Supports container delivery by managing Docker images, packaging deployable units, and ensuring containerized workloads meet AI infrastructure requirements.
  • Provides expert-level operational support across Linux-based infrastructure, ensuring stability, predictable performance, and proper integration of services, drivers, and system dependencies.
  • Handles essential components of AI stacks including NVIDIA GPU support, driver/kernel interactions, and hardware/software integration.
  • Works with Git (GitHub/GitLab) through pull requests, commits, branching, and repository workflows as part of DevOps-driven infrastructure operations.
  • Ensures infrastructure-as-code and automation changes are properly tracked, reviewable, and production-ready through source control.
  • Utilizes tools such as Grafana and Prometheus for monitoring and observability.
  • Supports environments involving ML frameworksMLOps tools, and AI inference workloads (not model training or research).
  • Applies familiarity with LLM orchestration tools (e.g., LangChain, LlamaIndex) and understands when simpler API-based approaches are appropriate.
  • Contributes to environments using service mesh technologies such as Istio or Linkerd.
  • Applies understanding of DevSecOps, security best practices, vulnerability scanning, API hardening (rate limiting, authentication middleware, input validation, security headers), and production WSGI servers such as Gunicorn or uWSGI.


Who are we looking for?

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related technical discipline.
  • Experience building, maintaining, and supporting infrastructure systems within OpenShiftKubernetesLinux, and Docker environments.
  • Hands‑on experience supporting AI infrastructure, including hardware awareness (NVIDIA GPUs, drivers, kernels) and workload enablement for inference.
  • Strong knowledge of OpenShift, including cluster operations, workload deployment, platform stability, and cloud‑native enhancements.
  • Practical experience managing Kubernetes orchestration, scaling, production operations, and integration with monitoring services.
  • Experience in container delivery, maintaining Docker-based deployment units, and ensuring operational readiness for AI components.
  • Strong operational expertise in Linux, including system dependencies, services, performance tuning, and stability during modernization.
  • Solid working knowledge of Git, including pull requests, commits, repository workflows, and collaborative DevOps practices.
  • Familiarity with GrafanaPrometheus, and other observability tooling.
  • Exposure to ML frameworksMLOps tools, and AI infrastructure support (not model training responsibilities).
  • Understanding of LLM orchestration toolsservice mesh technologies, and security best practices for production APIs.
  • Certifications in AWSAzureGCPCKA, or CKAD are an advantage.
  • Able to translate goals into AI‑assisted code generation, review generated code, and deploy it into production infrastructure.
  • Strong technical communication skills, able to speak fluently about infrastructure layers, DevOps concepts, and repository/deployment workflows.

Job Site Banner

Company Perks



Free learning and development courses for your personal and career growth


Comprehensive HMO benefits and insurance since day 1


Dynamic company events


Above-industry salary package and incentives

Opportunities for promotion


Free meals and snacks

Our Values

Worldwide, strongly uphold our values to be of service to our people, our clients, and our community.

WE PUT PEOPLE FIRST

We consider our people as the foundation of our success.

WE STRIVE FOR EXCELLENCE

Our commitment to quality ensures that we always do our best.

WE EMBRACE INNOVATION

We stay agile and fast, always looking for ways to solve our clients’ needs.

WE DELIVER DELIGHT

We pride ourselves on helping our clients reach their full potential.

WE CREATE REAL IMPACT

We do things right and we get the job done.