Requirements
- BS/MS/PhD in Computer Science or a related field
- 5+ years of industry experience in software development
- Proficiency with bash/Python scripting in Linux environments
- Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP)
- Experience with web crawlers, large-scale data processing workflows is a plus
- Ability to handle multiple tasks and adapt to changing priorities
- Strong communication skills, both written and verbal
What the job involves
- We're looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations
- We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us
- Be scrappy to find new sources of audio data and bring it into our ingestion pipeline
- Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform
- Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models
- Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products
Requirements
- BS/MS/PhD in Computer Science or a related field
- 5+ years of industry experience in software development
- Proficiency with bash/Python scripting in Linux environments
- Proficiency in Docker and Infrastructure-as-Code concepts and professional experience with at least one major Cloud Provider (we use GCP)
- Experience with web crawlers, large-scale data processing workflows is a plus
- Ability to handle multiple tasks and adapt to changing priorities
- Strong communication skills, both written and verbal
What the job involves
- We’re looking to hire for our Data side of our AI team at Speechify. This role is responsible for all aspects of data collection to support our model training operations
- We are able to build high-quality datasets at petabyte-scale and low cost through a tight integration of infrastructure, engineering, and research work. We are looking for a skilled Software Engineer to join us
- Be scrappy to find new sources of audio data and bring it into our ingestion pipeline
- Operate and extend the cloud infrastructure for our ingestion pipeline, currently running on GCP and managed with Terraform
- Collaborate closely with our Scientists to shift the cost/throughput/quality frontier, delivering richer data at bigger scale and lower cost to power our next-generation models
- Collaborate with others on the AI Team and Speechify Leadership to craft the AI Team’s dataset roadmap to power Speechify’s next-generation consumer and enterprise products
#J-18808-Ljbffr…
