EXO: End-to-End Local AI Cluster Deployment Training Course
EXO is an open-source framework that interconnects Apple Silicon devices into a distributed AI cluster, enabling local inference of frontier models that exceed the memory capacity of a single device.
This instructor-led, live training (available online or onsite) is designed for system administrators and DevOps engineers who want to deploy, configure, and manage EXO clusters to facilitate private LLM inference across multiple Apple Silicon or Linux nodes.
Upon completing this training, participants will be able to:
- Install and configure EXO on both macOS and Linux nodes.
- Activate automatic device discovery and assemble multi-node clusters.
- Enable and verify RDMA over Thunderbolt 5 to achieve ultra-low-latency inter-device communication.
- Deploy frontier models (including DeepSeek, Qwen, and Llama) across the clustered devices.
- Monitor cluster health and troubleshoot common deployment challenges.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical application.
- Hands-on implementation within a live-lab environment.
Customization Options
- For customized training requests, please contact us to arrange.
Course Outline
Introduction to EXO and Local AI Clustering
- Overview of the EXO framework and the exo-explore ecosystem
- Comparison of centralized cloud inference versus distributed local inference
- Architecture: libp2p device discovery, MLX backend, dashboard, and API layers
- Hardware requirements: Apple Silicon (M3 Ultra, M4 Pro/Max), Thunderbolt 5, shared storage
Installing EXO on macOS
- Setting up Xcode, Metal ToolChain, and macOS prerequisites
- Installing uv, Node.js, and the Rust nightly toolchain
- Installing the pinned macmon fork for Apple Silicon monitoring
- Cloning the repository and building the dashboard using npm
- Running EXO from source and verifying the localhost:52415 dashboard
Installing EXO on Linux
- Installing dependencies via apt or Homebrew on Linux
- Configuring uv, Node.js 18+, and the Rust nightly toolchain
- Building the dashboard and running EXO in CPU-only mode
- Directory layout: XDG Base Directory paths for config, data, cache, and logs
Automatic Device Discovery and Cluster Formation
- Understanding libp2p-based auto-discovery across local networks
- Configuring custom namespaces using EXO_LIBP2P_NAMESPACE for cluster isolation
- Verifying node membership in the dashboard cluster view
- Handling discovery failures and network segmentation issues
Enabling RDMA over Thunderbolt 5
- RDMA architecture and the claim of 99 percent latency reduction
- Enabling RDMA in macOS Recovery mode using rdma_ctl
- Cable requirements and port topology constraints on Mac Studio
- Ensuring macOS versions match across all cluster nodes
- Troubleshooting RDMA discovery and DHCP configuration
Deploying Frontier Models
- Using the dashboard to load and shard DeepSeek v3.1, Qwen3-235B, and Llama family models
- Previewing instance placements via the /instance/previews API endpoint
- Creating model instances with pipeline or tensor-parallel sharding
- Configuring custom model cards from the HuggingFace hub
Monitoring and Troubleshooting
- Reading EXO logs and understanding distributed tracing
- Interpreting cluster health in the dashboard cluster view
- Diagnosing worker node failures and reconnection behavior
- Using EXO_TRACING_ENABLED for performance bottleneck analysis
Cluster Maintenance and Updates
- Updating EXO binaries and performing dashboard rebuild procedures
- Migrating model caches and managing pre-downloaded models over NFS
- Gracefully removing nodes and rebalancing workloads
Requirements
- Understanding of networking fundamentals (IP addressing, subnetting, firewalls)
- Experience with command-line administration on macOS or Linux
- Familiarity with Python package management (pip/uv) and Node.js tooling
Audience
- System administrators
- DevOps engineers
- AI infrastructure architects responsible for on-premise LLM deployment
Open Training Courses require 5+ participants.
EXO: End-to-End Local AI Cluster Deployment Training Course - Booking
EXO: End-to-End Local AI Cluster Deployment Training Course - Enquiry
EXO: End-to-End Local AI Cluster Deployment - Consultancy Enquiry
Upcoming Courses
Related Courses
Advanced LangGraph: Optimization, Debugging, and Monitoring Complex Graphs
35 HoursLangGraph is a framework designed for constructing stateful, multi-agent LLM applications as composable graphs, featuring persistent state and precise execution control.
This instructor-led live training, available online or onsite, targets advanced AI platform engineers, DevOps professionals specializing in AI, and ML architects who aim to optimize, debug, monitor, and manage production-grade LangGraph systems.
By the conclusion of this training, participants will be able to:
- Design and optimize complex LangGraph topologies to enhance speed, reduce costs, and improve scalability.
- Ensure reliability through retries, timeouts, idempotency, and checkpoint-based recovery mechanisms.
- Debug and trace graph executions, inspect state changes, and systematically reproduce production issues.
- Instrument graphs with logs, metrics, and traces; deploy them to production; and monitor SLAs and costs.
Format of the Course
- Interactive lectures and discussions.
- Extensive exercises and practical practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request customized training for this course, please contact us to arrange it.
Building Coding Agents with Devstral: From Agent Design to Tooling
14 HoursDevstral is an open-source framework designed for building and running coding agents that can interact with codebases, developer tools, and APIs to enhance engineering productivity.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level ML engineers, developer-tooling teams, and SREs who wish to design, implement, and optimize coding agents using Devstral.
By the end of this training, participants will be able to:
- Set up and configure Devstral for coding agent development.
- Design agentic workflows for codebase exploration and modification.
- Integrate coding agents with developer tools and APIs.
- Implement best practices for secure and efficient agent deployment.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Open-Source Model Ops: Self-Hosting, Fine-Tuning and Governance with Devstral & Mistral Models
14 HoursDevstral and Mistral are open-source AI technologies engineered for flexible deployment, fine-tuning, and scalable integration.
This instructor-led live training, available online or onsite, targets intermediate to advanced ML engineers, platform teams, and research engineers seeking to self-host, fine-tune, and govern Mistral and Devstral models within production environments.
Upon completion of this training, participants will be able to:
- Set up and configure self-hosted environments for Mistral and Devstral models.
- Apply fine-tuning techniques to enhance domain-specific performance.
- Implement versioning, monitoring, and lifecycle governance mechanisms.
- Ensure security, compliance, and responsible usage of open-source models.
Course Format
- Interactive lectures and discussions.
- Hands-on exercises focused on self-hosting and fine-tuning.
- Live-lab implementation of governance and monitoring pipelines.
Customization Options
- To request customized training for this course, please contact us to make arrangements.
Fiji: Image Processing for Biotechnology and Toxicology
14 HoursThis instructor-led, live training in Bulgaria (online or onsite) is aimed at beginner-level to intermediate-level researchers and laboratory professionals who wish to process and analyze images related to histological tissues, blood cells, algae, and other biological samples.
By the end of this training, participants will be able to:
- Navigate the Fiji interface and utilize ImageJ’s core functions.
- Preprocess and enhance scientific images for better analysis.
- Analyze images quantitatively, including cell counting and area measurement.
- Automate repetitive tasks using macros and plugins.
- Customize workflows for specific image analysis needs in biological research.
LangGraph Applications in Finance
35 HoursLangGraph serves as a framework for constructing stateful, multi-agent LLM applications that operate as composable graphs with persistent state and precise control over execution.
This instructor-led, live training (available online or onsite) targets intermediate to advanced professionals aiming to design, implement, and manage LangGraph-based finance solutions that adhere to proper governance, observability, and compliance standards.
Upon completion of this training, participants will be capable of:
- Designing finance-specific LangGraph workflows that align with regulatory and audit requirements.
- Integrating financial data standards and ontologies into graph states and tooling.
- Implementing reliability, safety, and human-in-the-loop controls for critical processes.
- Deploying, monitoring, and optimizing LangGraph systems to ensure performance, cost efficiency, and SLA compliance.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical sessions.
- Hands-on implementation within a live-lab environment.
Course Customization Options
- To request customized training for this course, please contact us to arrange details.
LangGraph Foundations: Graph-Based LLM Prompting and Chaining
14 HoursLangGraph is a specialized framework designed for constructing LLM applications with a graph-based architecture. It facilitates advanced capabilities such as planning, branching logic, tool utilization, memory management, and controllable execution flows.
This instructor-led live training, available both online and onsite, is tailored for beginner-level developers, prompt engineers, and data practitioners aiming to design and implement reliable, multi-step LLM workflows using LangGraph.
Upon completion of this training, participants will be equipped to:
- Articulate core LangGraph concepts, including nodes, edges, and state, and understand appropriate use cases.
- Construct prompt chains that support branching, tool invocation, and persistent memory.
- Integrate retrieval mechanisms and external APIs into graph-based workflows.
- Test, debug, and evaluate LangGraph applications to ensure reliability and safety.
Course Format
- Interactive lectures coupled with facilitated discussions.
- Guided laboratory exercises and code walkthroughs within a sandbox environment.
- Scenario-based exercises focusing on design, testing, and evaluation strategies.
Customization Options
- For requests regarding customized training for this course, please contact us to arrange suitable arrangements.
LangGraph in Healthcare: Workflow Orchestration for Regulated Environments
35 HoursLangGraph facilitates stateful, multi-actor workflows driven by Large Language Models (LLMs), offering precise control over execution paths and state persistence. In the healthcare sector, these capabilities are essential for ensuring compliance, enabling interoperability, and developing decision-support systems that seamlessly align with clinical workflows.
This instructor-led live training (available online or onsite) targets intermediate to advanced professionals aiming to design, implement, and manage LangGraph-based healthcare solutions while navigating regulatory, ethical, and operational challenges.
Upon completion of this training, participants will be able to:
- Design healthcare-specific LangGraph workflows with compliance and auditability as key considerations.
- Integrate LangGraph applications with medical ontologies and standards, including FHIR, SNOMED CT, and ICD.
- Apply best practices for reliability, traceability, and explainability in sensitive environments.
- Deploy, monitor, and validate LangGraph applications within healthcare production settings.
Course Format
- Interactive lectures and discussions.
- Hands-on exercises featuring real-world case studies.
- Implementation practice in a live-lab environment.
Course Customization Options
- To request customized training for this course, please contact us to make arrangements.
LangGraph for Legal Applications
35 HoursLangGraph serves as a framework designed to build stateful, multi-agent LLM applications through composable graphs that maintain persistent state and offer precise control over execution.
This instructor-led training, available either online or on-site, targets intermediate to advanced professionals aiming to design, implement, and manage LangGraph-based legal solutions while ensuring necessary compliance, traceability, and governance controls.
Upon completion of this training, participants will be equipped to:
- Create legal-specific LangGraph workflows that maintain auditability and compliance.
- Incorporate legal ontologies and document standards into graph states and processing.
- Implement guardrails, human-in-the-loop approvals, and traceable decision pathways.
- Deploy, monitor, and maintain LangGraph services in production environments with effective observability and cost management.
Course Format
- Interactive lectures and discussions.
- Extensive exercises and practical applications.
- Hands-on implementation within a live laboratory environment.
Customization Options
- For tailored training on this subject, please contact us to arrange your specific requirements.
Building Dynamic Workflows with LangGraph and LLM Agents
14 HoursLangGraph is a framework designed for composing graph-structured workflows involving LLMs, enabling features such as branching, tool usage, memory management, and controllable execution.
This instructor-led, live training (available online or onsite) targets intermediate-level engineers and product teams looking to integrate LangGraph’s graph logic with LLM agent loops to create dynamic, context-aware applications. Examples include customer support agents, decision trees, and information retrieval systems.
Upon completing this training, participants will be capable of:
- Designing graph-based workflows that effectively coordinate LLM agents, tools, and memory.
- Implementing conditional routing, retries, and fallback mechanisms to ensure robust execution.
- Integrating retrieval mechanisms, APIs, and structured outputs into agent loops.
- Evaluating, monitoring, and hardening agent behavior to enhance reliability and safety.
Course Format
- Interactive lectures and facilitated discussions.
- Guided labs and code walkthroughs within a sandbox environment.
- Scenario-based design exercises and peer reviews.
Course Customization Options
- To request a customized training session for this course, please contact us to arrange.
LangGraph for Marketing Automation
14 HoursLangGraph is a graph-based orchestration framework designed to facilitate conditional, multi-step workflows involving Large Language Models (LLMs) and various tools. It is particularly well-suited for automating and personalizing content pipelines.
This instructor-led live training, available both online and onsite, targets intermediate-level marketers, content strategists, and automation developers looking to build dynamic, branching email campaigns and content generation pipelines using LangGraph.
Upon completion of this training, participants will be equipped to:
- Create graph-structured workflows for emails and content that incorporate conditional logic.
- Connect LLMs, APIs, and data sources to enable automated personalization.
- Manage state, memory, and context throughout multi-step campaigns.
- Assess, monitor, and optimize the performance and delivery results of workflows.
Course Format
- Interactive lectures accompanied by group discussions.
- Practical labs focused on implementing email workflows and content pipelines.
- Scenario-based exercises covering personalization, segmentation, and branching logic.
Course Customization Options
- To arrange a customized version of this training, please contact us.
Le Chat Enterprise: Private ChatOps, Integrations & Admin Controls
14 HoursLe Chat Enterprise is a private ChatOps solution that provides secure, customizable, and governed conversational AI capabilities for organizations, with support for RBAC, SSO, connectors, and enterprise app integrations.
This instructor-led, live training (online or onsite) is aimed at intermediate-level product managers, IT leads, solution engineers, and security/compliance teams who wish to deploy, configure, and govern Le Chat Enterprise in enterprise environments.
By the end of this training, participants will be able to:
- Set up and configure Le Chat Enterprise for secure deployments.
- Enable RBAC, SSO, and compliance-driven controls.
- Integrate Le Chat with enterprise applications and data stores.
- Design and implement governance and admin playbooks for ChatOps.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Cost-Effective LLM Architectures: Mistral at Scale (Performance / Cost Engineering)
14 HoursMistral is a high-performance family of large language models optimized for cost-effective production deployment at scale.
This instructor-led, live training (online or onsite) is aimed at advanced-level infrastructure engineers, cloud architects, and MLOps leads who wish to design, deploy, and optimize Mistral-based architectures for maximum throughput and minimum cost.
By the end of this training, participants will be able to:
- Implement scalable deployment patterns for Mistral Medium 3.
- Apply batching, quantization, and efficient serving strategies.
- Optimize inference costs while maintaining performance.
- Design production-ready serving topologies for enterprise workloads.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Productizing Conversational Assistants with Mistral Connectors & Integrations
14 HoursMistral AI is an open AI platform that empowers teams to construct and embed conversational assistants within enterprise and customer-facing workflows.
This instructor-led live training, available both online and onsite, is designed for beginner to intermediate-level product managers, full-stack developers, and integration engineers aiming to design, integrate, and productize conversational assistants leveraging Mistral connectors and integrations.
Upon completing this training, participants will be equipped to:
- Integrate Mistral conversational models with enterprise and SaaS connectors.
- Implement retrieval-augmented generation (RAG) to ensure grounded responses.
- Design user experience patterns for both internal and external chat assistants.
- Deploy assistants into product workflows to address real-world use cases.
Course Format
- Interactive lectures and discussions.
- Hands-on integration exercises.
- Live-lab development of conversational assistants.
Course Customization Options
- To request customized training for this course, please contact us to arrange.
Enterprise-Grade Deployments with Mistral Medium 3
14 HoursMistral Medium 3 is a powerful, multimodal large language model built for production-ready deployment within enterprise settings.
This instructor-led live training, available online or on-site, is designed for intermediate to advanced AI/ML engineers, platform architects, and MLOps teams looking to deploy, optimize, and secure Mistral Medium 3 for business applications.
Upon completion, participants will be able to:
- Deploy Mistral Medium 3 via API or self-hosted solutions.
- Enhance inference performance while managing costs.
- Develop multimodal applications using Mistral Medium 3.
- Apply industry best practices for security and compliance in enterprise environments.
Course Format
- Engaging lectures and discussions.
- Extensive exercises and practical work.
- Live-lab implementation experience.
Customization Options
- For tailored training on this course, please get in touch with us.
Mistral for Responsible AI: Privacy, Data Residency & Enterprise Controls
14 HoursMistral AI serves as an open and enterprise-ready AI platform, offering capabilities designed for the secure, compliant, and responsible deployment of artificial intelligence.
This instructor-led training, available either online or onsite, is tailored for compliance leads, security architects, and legal or operations stakeholders at an intermediate proficiency level. The course focuses on implementing responsible AI practices using Mistral by leveraging specific mechanisms for privacy, data residency, and enterprise controls.
Upon completion of this training, participants will be capable of:
- Implementing privacy-preserving techniques within Mistral deployments.
- Applying data residency strategies to satisfy regulatory requirements.
- Establishing enterprise-grade controls, including RBAC, SSO, and audit logging.
- Evaluating vendor and deployment options to ensure alignment with compliance standards.
Format of the Course
- Interactive lectures and discussions.
- Case studies and exercises focused on compliance.
- Hands-on implementation of enterprise AI controls.
Course Customization Options
- For organizations requesting a customized version of this training, please contact us to make arrangements.