Member of Technical Staff, Data Pipeline
On-site
Full Time
#Engineering
#Data
#Machine Learning
#Python
#PyTorch
#Data Collection
#Data Labeling
#Hadoop
#AWS
#Cloud
Boson AI is an early-stage startup dedicated to creating large language tools that are accessible to everyone. Led by founders Alex Smola and Mu Li, our team of experts in deep learning, optimization, and statistics is focused on building high-quality generative AI models as we work toward the development of AGI.
Responsibilities
- Design and build data collection pipelines to gather and prepare diverse datasets.
- Create and maintain processing pipelines that handle data labeling, filtering, cleaning, auditing, and visualization.
- Implement machine learning models to enhance the overall quality and diversity of our data.
Must-haves
- Strong experience building large-scale data processing pipelines and working with distributed workloads.
- Proficiency in Python with a focus on writing clean and maintainable code.
- Experience using deep learning frameworks, specifically PyTorch.
- A Bachelor's degree in computer science or a related field.
- Excellent problem-solving abilities and a sharp eye for detail when identifying data biases or anomalies.
- Fluency in English.
Nice-to-haves
- Familiarity with data collection and labeling tools like Selenium or LabelStudio.
- Experience with data processing frameworks such as Hadoop or Datasketch.
- Hands-on experience with cloud platforms like AWS, Azure, or GCP.
- A background in machine learning projects involving audio, vision, or language.
- Active contributions to open-source projects on GitHub.
- Multilingual skills that can help improve the diversity of our model training data.
- Knowledge of data privacy regulations, fairness, and toxicity mitigation.
Benefits
This is a full-time position based on-site at our office in Santa Clara.





