ML/Dev Ops Engineer at Radical AI

Radical AI logo
Radical AI

ML/Dev Ops Engineer

us flag
United States

175k - 275k USD

On-site

Full Time

#Engineering

#Slurm

#Kubernetes

#Terraform

#Ansible

#Python

Radical AI is looking for a ML/Dev Ops Engineer

Sign up to unlock quick summaries and profile fit assessments

Radical AI, Inc. is an artificial intelligence company that is accelerating scientific research & development. We are at the forefront of innovation in the field of materials R&D, a critical driver for advancing our most cutting-edge industries and shaping the future. Breaking away from the traditionally slow and costly R&D process, Radical AI leverages artificial intelligence and machine learning to pioneer generative materials science. This innovative field blends AI, engineering, and materials science, revolutionizing how materials are created and discovered. Radical AI's approach speeds up R&D and addresses global challenges, setting new benchmarks in technology and sustainability.

The opportunity

As an ML Ops Engineer, you’ll be joining our AI Research and Development team. This role involves playing a key role in developing our ML and data platform by helping to design, implement, and maintain scalable ML and DevOps infrastructure. You will be involved in helping to stand-up and maintain clusters and pipelines that support the development, training, and deployment of machine learning models for materials research, as well as building, scaling, and automating general infrastructure for use across our software stack. 

Mission

  • Deploy and manage GPU and CPU clusters for machine learning models and quantum chemistry by employing Slurm and Kubernetes.
  • Enable seamless replication of clusters across various cloud services, including Lambda Labs, IBM, and hyperscalers.
  • Implement and maintain monitoring, logging and alerting systems using Zabbix, Promethus, or another similar tool.
  • Develop and implement a CI/CD to enable safe and reproducible software.
  • Experience with widely-used DevOps tools, such as Terraform and Ansible, among others.
  • Optimize computing infrastructure by focusing on enhancing GPU utilization, distributed training, bandwidth efficiency between machines, and VPC connections to maximize system performance.
  • Work closely with the AI research team and cross-functional teams, including engineering, to ensure effective model deployment and integration into production systems.
  • Stay abreast of the latest developments in machine learning and data infrastructure, applying new techniques and methodologies to ongoing projects.
  • Conduct rigorous testing and validation of machine learning models and data pipelines to ensure accuracy, efficiency, and scalability.
  • Maintain comprehensive documentation of models, pipelines, algorithms, and experiments.
  • Troubleshoot and optimize machine learning models and data infrastructure, addressing technical challenges and improving overall performance.
  • Promote engineering best practices throughout the team.
  • Ensure adherence to ethical AI standards and best practices in all aspects of work.

About you

  • 3+ years of experience in a DevOps role, preferably in an AI/ML-focused environment.
  • Strong knowledge of Slurm and Kubernetes
  • Experience leveraging cloud (AWS/GCP/Azure) and reserved (Lambda Labs) computing platforms for scalable AI model deployment.
  • Proficiency in scripting languages such as Python, JavaScript, Bash, etc.
  • Experience with CI/CD tools such as Github Actions, CircleCI, Argo CD, etc.
  • Experience working with and scaling model training across GPU clusters.
  • Experience in building data pipelines and managing data infrastructure.
  • The ability to navigate complex challenges, strategically manage resources, and improve system efficiency.
  • Excellent written and verbal communication skills, with the ability to clearly convey complex technical information.
  • Ability to work effectively in a collaborative team environment.

Pluses

  • Master’s or PhD in Computer Science, AI, Data Science, or related field.
  • Familiarity with infrastructure-as-code tools like Terraform or CloudFormation.
  • Experience deploying and scaling quantum chemistry workloads with Vasp.
  • Basic ML knowledge.

Compensation

$175K – $275K + Equity + Benefits; base pay offered may vary depending on job-related knowledge, skills, and experience.

What we offer

A competitive compensation package also includes the best in benefits:

  • Medical, dental, and vision insurance for you and your family
  • Mental health and wellness support
  • Unlimited PTO and 14+ company holidays per year
  • 401K 
  • Work closely with a team on the cutting edge of AI research.
  • A mission: an opportunity to fundamentally change the way humanity makes progress through materials science discovery.

Radical AI is committed to equal employment opportunity regardless of race, color, ancestry, national origin, religion, sex, age, sexual orientation, gender identity and expression, marital status, disability, or veteran status.

Radical AI logo

Radical AI

6 views

0 applied
Visit Radical AI
Share this job
Copy Permalink
Open roles at Radical AI
Radical AI logo
Radical AI

ML Ops Engineer

us flag
United States

175k - 275k USD

On-site

Full Time

#Engineering

#AI

#Kubernetes

#Terraform

#Cloud Services

#Distributed Training

#VPC

#Data

#Learning

#DevOps

#Parallel Computing

Discover similar jobs
G
GR8_TECH

Senior Artificial Intelligence Specialist

Remote

Full Time

#IGaming

#Artificial Intelligence

#Python

#SQL

#AWS

#Docker

#Git

#LLM

SelectSourceInternational1 logo
SelectSourceInternational1

Electrical Estimator

Remote

Full Time

#Engineering

#Aerospace

#MS Excel

#PowerPoint

#Word

#Financial Analysis

#Project Management

#Value Engineering

#Negotiation

S
Stora

Senior Software Engineer

gb flag
United Kingdom

100k - 100k USD

Remote

Full Time

#Engineering

#Software Development

#Rails

#PostgreSQL

#Redis

#Sidekiq

#JavaScript

#CSS

#Minitest

#React

#Stripe

#GitHub Actions

ShipBob, Inc. logo
ShipBob, Inc.

Security Engineer II (Cloud Security)

in flag
India

Remote

Full Time

#Information Security

#Cloud Security

#Azure Active Directory

#Python

#PowerShell

#SIEM

#IAM

#RBAC

#OAuth

#SAML

#MITRE

#Trust

S
Socket

Sr. Software Engineer

Remote

Full Time

#Engineering

#Security

#Node.Js

#JavaScript

#React

#TypeScript

#Postgres

#GraphQL

#Elasticsearch

Fundraise Up logo
Fundraise Up

Backend Developer

62k - 80k USD

Remote

Full Time

#Engineering

#Fintech

#Node.Js

#TypeScript

#MongoDB

#Kafka

#NestJS

#Koa

#Redis

#Clickhouse

#Elasticsearch

A
Altamira.ai

Senior DevOps Engineer

Remote

Full Time

#DevOps

#Engineering

#Kubernetes

#Terraform

#AWS

#Prometheus

#Grafana

#ELK

#CloudFormation

#GitHub Actions

#Argo

Tameson logo
Tameson

Technical Content Strategist

Remote

Contractor

#Marketing

#Technical Content

#Engineering

#AI Tools

#Content Strategy

#Technical Writing

#Data Analysis

#Product Management

#SEO Optimization

H
Hyperbolic

Member of Technical Staff - Full Stack

Remote

Full Time

#Engineering

#Node

#TypeScript

#Python

#ORM

#Postgres

#Vercel

#CI CD

#A B Testing

#API Design

AeroVect logo
AeroVect

Infrastructure Engineer

Remote

Full Time

#Engineering

#Infrastructure

#Autonomous

#Cloud

#Data Pipelines

#DevOps

#Build Systems

#Localization

#Planning

#Systems

Panopto logo
Panopto

AI Engineer

Remote

Full Time

#Research

#Engineering

#AI Engineering

#LLM

#Design

#Workflows

#GuardRails

#Observability

#Data Pipelines

#Software

T
TreehouseStrategyAndCommunicatio

Technical Lead Full Stack Developer

25k - 25k USD

Remote

Full Time

#Engineering

#Software Development

#C#

#React

#HTML5

#CSS

#JavaScript

#Bootstrap

#Entity Framework

#LINQ

#SQL

#MS SQL Server

FocusReactive logo
FocusReactive

JavaScript Engineer

Remote

Full Time

#Engineering

#JavaScript

#React

#Node.Js

M
Mystenlabs

Senior Software Engineer, TypeScript SDK

Remote

Full Time

#Engineering

#TypeScript

#React

#Rust

#API Development

#Software Design

#Code Review

DroneDeploy logo
DroneDeploy

Manager of IT Engineering

us flag
United States

Remote

Full Time

#Software Engineering

#DevOps

#Okta

#Google Workspace

#Slack

#Atlassian

#Python

#BASH

#SOC 2

#ISO 27001

#AI Tools

Istaridigital.ai logo
Istaridigital.ai

Senior DevSecOps Engineer

Remote

Full Time

#Engineering

#Infrastructure

#AWS

#Kubernetes

#Terraform

#Linux

#Windows

#Active Directory

#IAM

#Python

#BASH

reka logo
reka

Member of Technical Staff (Robotics Research Lead)

Remote

Full Time

#Artificial Intelligence

#Robotics

#Computer Vision

#Python

#C++

#3D

#Systems

#Machine Learning

Bitfinex logo
Bitfinex

Junior Risk Monitoring Analyst

Remote

Full Time

#Risk Management

#Financial Markets

#Risk

#Data Analysis

#SQL

#Python

#Trading

#Attention To Detail

#Written Communication

Blockdeliver logo
Blockdeliver

Senior Software Engineer

Remote

Full Time

#Engineering

#Distributed Systems

#Networking

#Smart Contracts

#Backend Development

#Unix

#Cryptoeconomics

#Mechanism Design

Makersitegmbh logo
Makersitegmbh

Senior Data Scientist

Remote

Full Time

#Engineering

#Sustainability

#Python

#fastAPI

#LLM

#Databases

#Docker

#Kubernetes

#MLOps

Your dream job awaits.

Explore exciting opportunities, connect with top employers, and ignite your career.