Data Engineer at Notable Labs
San Francisco, CA, US
Company Overview:

Changing the way cancer is treated is our personal mission, which starts with putting patients first. We’ve developed an individualized laboratory testing service for cancer patients and their doctors. We screen thousands of FDA-approved drugs against the patients own cancer cells to identify drug combinations that can be immediately prescribed by their doctor without a clinical trial. By repositioning treatment as a patient-centered service we can unlock the power of modern data science and laboratory automation to achieve the promise of combination therapy and personalized medicine.

Our investors include Founders Fund, First Round Capital, Y Combinator, several prominent angels and seed-stage funds, and Accelerate Brain Cancer Cure, a venture philanthropy firm founded by Steve Case. We have offices and a laboratory in San Francisco’s SoMa district.


We’ve developed a high throughput robotic testing platform that uses a patient’s live tumor cells to predict the safest and most effective treatment combinations. We employ an ex vivo assay to survey thousands of drug combination options. To ensure our translational system is predictive we’ve created a proprietary method to mimic the microenvironment of the human body. Our drug panel includes all approved chemotherapies and targeted agents, as well as generic non-oncology drugs that have published anti-neoplastic evidence.

We prioritize the safest combinations from our viability assay by estimating a therapeutic index. Successful hits are counter-screened against normal human cells to determine a customized, relative drug scoring system for each patient. The end result of our process is a CLIA certified report which prioritizes therapeutic options that could be used by the physician and patient.

Position Overview:

We're looking for a data engineer to build a pipeline to automatically identify cancer cells. Our robotic platform generates millions of data points per patient. Currently, this data is manually analyzed by scientists which is a huge bottleneck. Our only hope to scale the number of patients we help is to automate this analysis.

The day to day work would involve working with our science team to build a web app that uses clustering and machine learning to automate their analysis. The ideal candidate understands basic data science approaches but also has the engineering experience to build a web interface around these tools. We're interested in scale, so we're looking to build this tool as a Python backed web-app.

Experience building production data pipelines using Python
Experience with machine learning in production is a plus
Basic DevOps experience (especially storing big data on AWS) is a plus
Basic experience building web interfaces (javascript single page apps) is a plus