Made for radiologists, by radiologists.

Staff Machine Learning Engineer – Infrastructure

Full TimeRemoteTeam 51-200Since 2018H1B SponsorCompany Site LinkedIn

Location

United States

Posted

87 days ago

Salary

$200K - $240K / year

Bachelor Degree8 yrs expEnglishAirflowAnsibleAWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformGrafanaJava ScriptKubernetesPythonPy TorchTerraformType Script

Job Description

• Architect the infrastructure that supports our machine learning applications, services, and workflows • Architect and maintain our ML platform that supports continuous integration, continuous delivery, and continuous training for our machine learning models • Develop cloud-native services and serverless architectures to build scalable and resilient systems • Partner with data scientists to design the data pipeline that enable various machine learning models in production • Write code that meets our internal standards for security, style, maintainability, and best practices for a high-scale HIPAA web environment • Design, deploy, and maintain the full ML platform stack including monitoring and observability, data analytics, backend integration with customer-facing products, and the full model R&D lifecycle • Work with Product Management, Research, and Engineering to iterate on new features and address inefficiencies across our AI/ML infrastructure

Job Requirements

8+ years of industry experience in ML Engineering in cloud-native environments
In-depth knowledge of Python (required), Javascript/Typescript (nice to have), or other modern languages in the ML domain
Strong experience with infrastructure and DevOps tools such as Kubernetes, Docker, and Ansible
Strong knowledge of cloud computing platforms such as AWS (preferable), GCP, and Azure
Experience architecting distributed systems, storage systems, and databases
Experience working with machine learning frameworks such as PyTorch and LangGraph
Experience with Airflow (preferable) or other orchestration tools
Experience with infrastructure-as-code tools such as Terraform (preferable), Pulumi, Cloud Formation, etc.
Experience with monitoring, tracing, and logging tools such Cloudwatch, NewRelic, Grafana, etc.
Excellent communication skills, with a strong sense of ownership and a systematic approach to problem-solving
Proven ability to manage and lead active incidents, address what caused them, and establish systems to avoid them in the future via blameless postmortems

Benefits

Comprehensive Medical, Dental, Vision & Life insurance
HSA (with employer match), FSA, & DCFSA
401(k)
11 Paid Company Holidays
Location Flexibility (Remote-first company!)
Flexible PTO policy
Annual company-wide offsite
Periodic team offsites
Annual equipment stipend
For roles based outside the US, your recruiter can share more details

Related Categories

Machine Learning Engineer AI Engineer AI Research Scientist LLM Engineer Computer Vision Engineer NLP Engineer

Related Job Pages

Remote Full-time Jobs (US)Remote Python Jobs (US)More US Remote Jobs