Data Scientist at Second Genome
Brisbane, CA, US

We are a fast-paced, venture-backed biotechnology company developing breakthrough therapeutics through innovative microbiome science, and we are looking for a Data Scientist to develop and apply our machine learning capabilities to critical problems in microbiome science and human health.


You are not intimidated by difficult machine learning problems with high-dimensional data. You are an independent thinker and you love sharing your knowledge with others. You have passion for learning and advancing human health through data-driven therapeutic discovery.


How You’ll Impact the Company:

  • Expand Second Genome’s data science and machine learning capabilities through research and implementation of state-of-the-art methods to solve our domain-specific problems.
  • Design, implement and execute small to large data science projects in collaboration with other Second Genome program, project, and function leads for data-driven decision support and/or to fulfill criteria as defined in partnership agreements or other externally funded research.
  • Develop and maintain Second Genome’s cloud configurable machine learning pipeline and operations according to SDLC and MLOps best practices
  • Collaborate with Platform product and Engineering functions to define requirements needed for timely and successful execution of data science deliverables
  • Collaborate with Product and Omics functions when evaluating new platform capabilities or modifications thereof, including new and alternative software packages and parameterizations
  • Participate in reciprocal code reviews with other Informatics functions
  • Contribute to recognition of Second Genome as an industry leader in microbiome data science through conference presentations, patent applications, peer-reviewed publications and other external communications


What You Bring to the Role:

  • You have, or are actively pursuing, an advanced degree in computer science, computer engineering, electrical engineering, or equivalent.
  • You have demonstrated experience in developing, implementing and communicating machine learning algorithms or high-dimensional data analyses
  • You have experience working with small-n-large-p datasets and managing false discovery
  • You have hands-on experience with model interpretation and visualizing feature importances and interactions
  • You are comfortable reading and presenting technical papers and conference proceedings in machine learning.
  • You are proficient in Python, Python’s scikit-learn library, and Jupyter notebooks. Ideally you have exposure to mlflow, metaflow or other experiment tracking systems
  • You are familiar with SDLC and source code revision control such as git, SVN or others. Familiarity with AWS or other cloud service providers is a plus.