About the Role
We are looking for a skilled DevOpsEngineer to build, maintain, and optimise our cloud infrastructure and deployment pipelines, enabling fast, reliable, and secure delivery of AI-driven applications. You will play a critical role in bridging the gap between development and operations, driving automation, scalability, and observability across our systems.
What You’ll Do
- Design, implement, and manage cloud infrastructure using tools like Terraform, CloudFormation, or Pulumi, ensuring environments are consistent, reproducible, and secure.
- Build and maintain robust CI/CD pipelines (using tools like GitHub Actions, GitLab CI, Jenkins) to automate code builds, tests, and deployments.
- Operate and optimise multi-region, scalable cloud infrastructure(primarily AWS, GCP, or Azure) for high availability and cost-efficiency.
- Package applications using Docker and deploy them at scale with Kubernetes or other orchestration frameworks.
- Set up comprehensive observability tooling with Prometheus, Grafana, ELK/EFK, Datadog, or Cloud-native monitoring to ensure system health and performance.
- Implement DevSecOps best practices including automated vulnerability scanning, secrets management, and identity/access control.
- Work closely with software engineers, ML engineers, and data teams to support infrastructure needs across development and production environments.
- Participate in incident response rotations and postmortem analysis to drive continuous improvement in reliability and performance.
What We're Looking For
- 5+ years of experience in DevOps, Site Reliability Engineering, or CloudInfrastructure Engineering roles.
- Proficiency with cloud platforms (AWS preferred; GCP/Azure a plus),including experience with services like EC2, S3, Lambda, IAM, and VPCs.
- Strong experience with containerisation (Docker) and Kubernetes(EKS/GKE/AKS) in production environments.
- Hands-on experience with CI/CD tools and automation frameworks forinfrastructure and deployments.
- Familiarity with monitoring stacks and incident response best practices.
- Solid scripting skills in Python, Bash, or Go.
- Experience supporting AI/ML workflows (e.g., GPU instances, ML modeldeployment pipelines, large-scale data workflows) is a strong plus.
- Deep understanding of networking, security, and system design principles in cloud environments.
- Strong communication skills and a collaborative mindset for working with cross-functional engineering teams.
What We're Looking For
- Work with a cutting-edge tech stack across infrastructure, MLOps, and DevSecOps.
- Collaborate with talented engineers, researchers, and architects solving meaningful challenges at scale.
- Drive automation and reliability in systems critical to detecting and preventing synthetic media threats.
- Enjoy a flexible, high trust work culture focused on autonomy, innovation, and growth.
To apply, please send your CV and cover letter to info@datambit.com. For any questions about the position, feel free to reach out at the same address.