About the role
We are looking for a meticulous and detail-oriented Biology Data Quality Engineer to ensure the integrity and usability of the various and complex datasets that are central to our mission. In this critical role, you'll leverage your expertise in biology, data science, and machine learning to ensure the quality and consistency of biological data used to train and evaluate our foundation models. You'll work in collaboration with the R&D team and our engineers, using your skills to ensure our data meets the highest standards.
Responsibilities
- Develop and implement comprehensive data validation protocols for diverse biological datasets (histology, omics, clinical). Ensure data integrity, consistency, and accuracy through rigorous quality checks. Design and implement automated data quality pipelines to streamline data validation and identify potential issues early in the data processing workflow.
- Establish and enforce data standardization practices to facilitate seamless integration and analysis across different data types. Curate datasets to enhance their usability for machine learning.
- Work closely with the R&D team to understand data requirements and address data quality concerns. Communicate data quality findings and recommendations effectively to technical and non-technical stakeholders. Collaborate and synchronize with external data providers.
- Maintain detailed documentation of the data-quality assessment procedures, validation results, and data specifications. Generate regular reports on data quality metrics and trends.
- Evaluate and validate external public data sources, ensuring they meet quality standards and are suitable for inclusion in our foundation model training.
- Stay up-to-date with the latest data quality best practices and tools in the biological domain. Propose and implement improvements to data-quality assessment processes and pipelines.
Qualifications
- Deep understanding of transcriptomics data types (bulk, single-cell, spatial) and their specific quality considerations. Good knowledge of genomics and proteomics data.
- Proven experience in implementing data quality control procedures and pipelines. Familiarity with data validation tools and techniques.
- Strong analytical and problem-solving skills to identify and resolve data quality issues.
- Proficiency in Python, good knowledge of data visualization libraries (e.g., matplotlib).
- Excellent written and verbal communication skills to effectively convey data quality findings and recommendations.
- MSc in Biology, Computational Biology, or Bioinformatics.
Preferred Experience
- Experience in machine learning analysis of histology images.
- Experience working with AWS.
- Experience with developing and implementing data annotation guidelines and processes. Experience with data ontologies.
- Experience building or contributing to large-scale data collections (e.g., Human Cell Atlas).
- Spatial alignment of multimodal datasets (e.g., alignment between different imaging modalities).
Benefits
- Collaborative and mission-driven work environment.
- Competitive salary and equity package.
- Flexible work arrangements, including remote options.
- Opportunities for professional growth and leadership development.
- Opportunity to shape the future of biology and AI by contributing to groundbreaking work.
We believe that the unique contributions of all Bioptimists create our success. To ensure that our culture continues to incorporate everyone’s perspectives and experience, we never discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, or disability status. Decisions related to hiring are made fairly, and we provide equal employment opportunities to all qualified candidates. We take responsibility for always striving to create an inclusive environment that makes every employee and candidate feel welcome.
#J-18808-Ljbffr