ML Ops Engineer
175k - 275k USD
On-site
Full Time
#Engineering
#AI
#Kubernetes
#Terraform
#Cloud Services
#Distributed Training
#VPC
#Data
#Learning
#DevOps
#Parallel Computing
Radical AI, Inc. is an artificial intelligence company that is accelerating scientific research & development. We are at the forefront of innovation in the field of materials R&D, a critical driver for advancing our most cutting-edge industries and shaping the future. Breaking away from the traditionally slow and costly R&D process, Radical AI leverages artificial intelligence and machine learning to pioneer generative materials science. This innovative field blends AI, engineering, and materials science, revolutionizing how materials are created and discovered. Radical AI's approach speeds up R&D and addresses global challenges, setting new benchmarks in technology and sustainability.
The opportunity
As an ML Ops Engineer, you’ll be joining our AI Research and Development team. Reporting to the Vice President of Research, this role involves playing a key role in developing our ML and data platform from the ground up by designing, implementing, and maintaining scalable ML infrastructure and pipelines to support the development, training, and deployment of machine learning models for materials research applications. The successful candidate will play a crucial role in advancing our AI capabilities and contributing to groundbreaking projects in materials science.
Mission
- Deploy and manage advanced machine learning models, with a focus on generative models for materials discovery by employing Kubernetes, Terraform, and cloud services (Lambda) to deploy and scale models efficiently, ensuring their adaptability to high-demand scenarios.
- Optimize computing infrastructure by focusing on enhancing GPU utilization, distributed training, bandwidth efficiency between machines, and VPC connections to maximize system performance.
- Work closely with the AI research team and cross-functional teams, including engineering, to ensure effective model deployment and integration into production systems.
- Stay abreast of the latest developments in machine learning and data infrastructure, applying new techniques and methodologies to ongoing projects.
- Handle large datasets, perform data preprocessing, and extract meaningful insights relevant to materials science.
- Run, monitor and maintain business-critical systems.
- Conduct rigorous testing and validation of machine learning models and data pipelines to ensure accuracy, efficiency, and scalability.
- Maintain comprehensive documentation of models, pipelines, algorithms, and experiments.
- Troubleshoot and optimize machine learning models and data infrastructure, addressing technical challenges and improving overall performance.
- Promote engineering best practices throughout the team.
- Ensure adherence to ethical AI standards and best practices in all aspects of work.
About you
- Solid experience with DevOps, cloud infrastructure, and deploying machine learning models. Expertise in network optimization and parallel computing is crucial.
- Experience with Kubernetes, Terraform, and cloud computing platforms for scalable AI model deployment.
- The ability to navigate complex challenges, strategically manage resources, and improve system efficiency.
- Basic ML knowledge, with experience in training generative models at scale.
- Experience working with and scaling model training across GPU clusters.
- Experience in building data pipelines and managing data infrastructure.
- Excellent written and verbal communication skills, with the ability to clearly convey complex technical information.
- Ability to work effectively in a collaborative team environment.
Pluses
- Master’s or PhD in Computer Science, AI, Data Science, or related field.
- Experience with Lambda cloud.
- Experience integrating RAG technologies.
Compensation
$175K – $275K + Equity + Benefits; base pay offered may vary depending on job-related knowledge, skills, and experience.
What we offer
A competitive compensation package also includes the best in benefits:
- Medical, dental, and vision insurance for you and your family
- Mental health and wellness support
- Unlimited PTO and 14+ company holidays per year
- 401K
- Work closely with a team on the cutting edge of AI research.
- A mission: an opportunity to fundamentally change the way humanity makes progress through materials science discovery.
Radical AI is committed to equal employment opportunity regardless of race, color, ancestry, national origin, religion, sex, age, sexual orientation, gender identity and expression, marital status, disability, or veteran status.





