Deep Learning Performance Engineer, Senior
170k - 237k USD
On-site
Full Time
#Engineering
#AI
#Machine Learning
#CUDA
#Deep Learning
#PyTorch
#Systems
#Networking
#Ray
#TensorRT
Picture a small but fast-moving group of engineers and researchers in San Francisco who believe every developer should be able to scale an AI model from a laptop to a full cluster without first becoming a distributed-systems expert. That belief is what drives Anyscale: we commercialize Ray, the open-source framework already powering production workloads at OpenAI, Uber, Spotify, Instacart, and Cruise. Backed by Andreessen Horowitz, NEA, and Addition, we have raised more than $250 million to turn that vision into reliable, high-performance infrastructure.
The opportunity
As a Senior Deep Learning Performance Engineer you will design and ship the low-level optimizations that keep Anyscale at the leading edge of price-performance for AI workloads. Your work directly influences the speed of our platform, our hosted endpoints, and the open-source tools the community relies on, ensuring customers can train and serve ever-larger models without hidden bottlenecks.
A day in the life
- Partner with product teams to prototype, benchmark, and roll out new performance improvements to the Anyscale platform and open-source projects on a weekly cadence.
- Collaborate with research colleagues on engines such as vLLM and TensorRT-LLM, diagnosing throughput or latency issues and implementing targeted fixes in CUDA and related runtimes.
- Track emerging techniques across conferences, repositories, and pre-print servers, then integrate the most promising ideas into our internal stack and public releases.
Who you are
You bring several years of hands-on experience tuning workloads on GPUs with CUDA, along with a solid grasp of operating-system and networking internals that lets you spot and remove performance barriers. You are comfortable navigating deep-learning frameworks such as PyTorch and can translate model-level requirements into systems-level changes. Strong written and spoken English is essential for working across our engineering and research teams. Experience contributing to ML compilers, training large models, or using Ray is valued but not required.
Why you'll love it here
We offer a market-based salary for this role between $170,112 and $237,000. In addition, the position includes equity via stock options, medical coverage with 99 percent of premiums paid by Anyscale, a 401k retirement plan, wellness and education stipends, paid parental leave, flexible time off, commuter reimbursement, and full coverage of in-office meals. Employees work on-site in San Francisco three days each week, giving the team regular face-to-face collaboration while still preserving flexibility.




