We are a fast-paced, venture-backed biotechnology company developing breakthrough therapeutics through innovative microbiome science, and we are looking for a Data Scientist to develop and apply our machine learning capabilities to critical problems in microbiome science and human health.
You are not intimidated by difficult machine learning problems with high-dimensional data. You are an independent thinker and you love sharing your knowledge with others. You have passion for learning and advancing human health through data-driven therapeutic discovery.
How You’ll Impact the Company:
- Expand Second Genome’s data science and machine learning capabilities through research and implementation of state-of-the-art methods to solve our domain-specific problems.
- Design, implement and execute small to large data science projects in collaboration with other Second Genome program, project, and function leads for data-driven decision support and/or to fulfill criteria as defined in partnership agreements or other externally funded research.
- Develop and maintain Second Genome’s cloud configurable machine learning pipeline and operations according to SDLC and MLOps best practices
- Collaborate with Platform product and Engineering functions to define requirements needed for timely and successful execution of data science deliverables
- Collaborate with Product and Omics functions when evaluating new platform capabilities or modifications thereof, including new and alternative software packages and parameterizations
- Participate in reciprocal code reviews with other Informatics functions
- Contribute to recognition of Second Genome as an industry leader in microbiome data science through conference presentations, patent applications, peer-reviewed publications and other external communications
What You Bring to the Role:
- You have, or are actively pursuing, an advanced degree in computer science, computer engineering, electrical engineering, or equivalent.
- You have demonstrated experience in developing, implementing and communicating machine learning algorithms or high-dimensional data analyses
- You have experience working with small-n-large-p datasets and managing false discovery
- You have hands-on experience with model interpretation and visualizing feature importances and interactions
- You are comfortable reading and presenting technical papers and conference proceedings in machine learning.
- You are proficient in Python, Python’s scikit-learn library, and Jupyter notebooks. Ideally you have exposure to mlflow, metaflow or other experiment tracking systems
- You are familiar with SDLC and source code revision control such as git, SVN or others. Familiarity with AWS or other cloud service providers is a plus.