Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction and Diagnostic Foundations
- Overview of failure modes in LLM systems and common Ollama-specific issues.
- Establishing reproducible experiments and controlled environments.
- Debugging tools: local logs, request/response captures, and sandboxing.
Reproducing and Isolating Failures
- Techniques for creating minimal failing examples and seeds.
- Stateful vs. stateless interactions: isolating context-related bugs.
- Managing determinism, randomness, and controlling nondeterministic behavior.
Behavioral Evaluation and Metrics
- Quantitative metrics: accuracy, ROUGE/BLEU variants, calibration, and perplexity proxies.
- Qualitative evaluations: human-in-the-loop scoring and rubric design.
- Task-specific fidelity checks and acceptance criteria.
Automated Testing and Regression
- Unit tests for prompts and components, scenario and end-to-end tests.
- Creating regression suites and golden example baselines.
- CI/CD integration for Ollama model updates and automated validation gates.
Observability and Monitoring
- Structured logging, distributed traces, and correlation IDs.
- Key operational metrics: latency, token usage, error rates, and quality signals.
- Alerting, dashboards, and SLIs/SLOs for model-backed services.
Advanced Root Cause Analysis
- Tracing through graphed prompts, tool calls, and multi-turn flows.
- Comparative A/B diagnosis and ablation studies.
- Data provenance, dataset debugging, and addressing dataset-induced failures.
Safety, Robustness, and Remediation Strategies
- Mitigations: filtering, grounding, retrieval augmentation, and prompt scaffolding.
- Rollback, canary, and phased rollout patterns for model updates.
- Post-mortems, lessons learned, and continuous improvement loops.
Summary and Next Steps
Requirements
- Extensive experience in building and deploying LLM applications.
- Familiarity with Ollama workflows and model hosting.
- Proficiency with Python, Docker, and basic observability tools.
Audience
- AI engineers.
- MLOps professionals.
- QA teams responsible for production LLM systems.
35 Hours