Senior Data Scientist (R&D) – AI/ML for Private Credit
5+ years
About the Role
We’re looking for a Senior Data Scientist to lead R&D initiatives at the intersection of LLMs, information retrieval, and private credit analytics. You’ll fine-tune small language models on financial documents, build agentic workflows for multi-step reasoning, and develop production-ready extraction systems that power our AI platform for institutional investors.
This role bridges cutting‑edge research with real‑world deployment. You’ll work closely with Prompt Engineers on hybrid LLM+ML approaches, partner with QA Data on evaluation frameworks, and translate research into detailed specs for our Platform Engineering team. Your models will process thousands of credit agreements daily, requiring both innovation and reliability.
What You’ll Do
- Fine‑tune Small Language Models on proprietary private credit corpus (credit agreements, indentures, term sheets)
- Develop information retrieval systems: semantic search, ranking algorithms, and context‑aware retrieval
- Build agentic workflows with multi-step reasoning, tool use, reflection, and self‑correction capabilities
- Train classification models for document type identification, section detection, and entity recognition
- Create extraction models: NER for financial entities, relation extraction, structured table parsing
- Partner with Prompt Engineers on prompt optimization strategies and hybrid LLM+ML approaches
- Experiment with latest techniques: RAG architectures, fine‑tuning methods (LoRA, QLoRA), model distillation
- Present research findings to engineering team and stakeholders monthly (progress, insights, recommendations)
- Stay current with academic research and industry developments in NLP, LLMs, and financial ML
Production Readiness & Deployment
- Write detailed technical specs for Platform team: model architecture, dependencies, deployment steps, API contracts
- Define production readiness criteria: performance benchmarks, edge case handling, failover mechanisms, rollback procedures
- Create comprehensive model cards: intended use, limitations, bias analysis, performance metrics, monitoring requirements
- Optimize models for production constraints: latency 95%, cost <$0.01/extraction
Evaluation & Quality Assurance
- Work with QA Data Teams on model evaluation frameworks and benchmark dataset creation
- Build evaluation frameworks with offline metrics (accuracy, precision, recall, F1) and online metrics (user feedback, business impact)
- Create benchmark datasets: 1K+ examples per task with expert annotations and inter‑annotator agreement analysis
- Define task‑specific success criteria tied to business outcomes (e.g., covenant extraction accuracy → analyst time savings)
- Investigate performance degradation: is it data drift, concept drift, or infrastructure issues?
- Retrain models quarterly with new data, improved techniques, and expanded coverage of edge cases
- Maintain model performance dashboards and alert systems for critical degradation
Required Qualifications Technical Expertise
- 5+ years experience in ML/NLP with 2+ years focused on LLMs and transformers
- Strong hands‑on experience with fine‑tuning language models (BERT, RoBERTa, GPT-style models, LLaMA/Mistral)
- Expertise in information retrieval: vector databases (Pinecone, Weaviate, Qdrant), embedding models, semantic search
- Proficiency in Python ML stack: PyTorch/TensorFlow, Hugging Face, LangChain, scikit‑learn, pandas
Domain & Problem‑Solving
- Experience with document processing and extraction tasks (OCR pipelines, layout analysis, table extraction)
- Ability to translate vague business requirements into concrete ML problem statements
- Track record of moving models from research/prototype to production with measurable impact
- Strong understanding of evaluation methodology: offline vs online metrics, statistical significance testing
- Experience writing technical documentation for engineering teams (architecture docs, API specs, runbooks)
- Ability to present complex technical concepts to non‑technical stakeholders
- Comfortable working in cross‑functional teams with prompt engineers, platform engineers, and QA analysts
Preferred Qualifications
- Experience in financial services, credit analysis, or FinTech (private credit, leveraged finance, structured products)
- Familiarity with agentic frameworks: LangGraph, AutoGPT, ReAct patterns, tool‑calling workflows
- Knowledge of model compression techniques: quantization, pruning, knowledge distillation
- Experience with MLOps tools: MLflow, Weights & Biases, DVC, feature stores
- Understanding of financial document structures: credit agreements, indentures, term sheets, prospectuses
- Publications or patents in NLP, information extraction, or document understanding
Benefits
Enjoy competitive compensation, a comprehensive benefits package, and opportunities for professional growth. Immerse yourself in an innovative work environment, maintain a healthy work‑life balance, and contribute to a diverse and inclusive culture. Join us to work with cutting‑edge technology, and be part of a team that recognizes and rewards your achievements, all while fostering a fun and engaging workplace culture.
Disclaimer
Alphastream.ai is an equal opportunities employer. We work to provide a supportive and inclusive environment where all individuals can maximize their full potential. Our skilled and creative workforce is comprised of individuals drawn from a broad cross section of all communities in which we operate and who reflect a variety of backgrounds, talents, perspectives, and experiences. Our strong commitment to a culture of inclusion is evident through our constant focus on recruiting, developing, and advancing individuals based on their skills and talents.
#J-18808-Ljbffr…
