Get in Touch

Course Outline

Foundations of Cloud Operations on AWS

  • Defining operational roles and responsibilities in the cloud.
  • Understanding AWS account structures, AWS Organizations, and multi-account strategies.
  • Exploring core operational services: CloudWatch, CloudTrail, and AWS Config.

Infrastructure as Code and Provisioning

  • Key principles of IaC and immutable infrastructure.
  • Provisioning infrastructure using Terraform and AWS CloudFormation.
  • Managing state files, modules, and environment promotion.

CI/CD and Deployment Strategies

  • Designing CI/CD pipelines tailored for cloud-native applications.
  • Implementing blue/green, canary, and rolling deployment strategies.
  • Automating rollback procedures, health checks, and release validation.

Monitoring, Observability, and Alerting

  • Handling metrics, logs, and traces: shipping, storing, and analyzing data.
  • Utilizing CloudWatch, X-Ray, and third-party observability tools.
  • Establishing Service Level Objectives (SLOs)/Service Level Indicators (SLIs), alerting policies, and on-call protocols.

Security Operations and Identity Management

  • IAM best practices, enforcing least privilege, and managing cross-account access.
  • Managing secrets, using KMS, and securing parameter stores.
  • Operational security: patching strategies, vulnerability scanning, and maintaining audit trails.

Resilience, Backup, and Disaster Recovery

  • Designing architectures for fault tolerance and high availability.
  • Establishing backup strategies, automating snapshots, and defining restore procedures.
  • Developing disaster recovery plans and creating operational runbooks.

Cost Optimization and Governance

  • Enhancing cost visibility through billing analysis, tagging, and cost allocation.
  • Rightsizing resources, utilizing reserved instances/savings plans, and implementing budget controls.
  • Governance: establishing policies, guardrails, and automation for compliance.

Containers, Serverless, and Runtime Operations

  • Operational considerations for ECS, EKS, and Lambda.
  • Managing service discovery, autoscaling, and resource limits.
  • Logging, tracing, and debugging containerized workloads.

Incident Response, Playbooks, and Chaos Engineering

  • Runbook-driven incident response and postmortem analysis practices.
  • Automating remediation steps and implementing self-healing patterns.
  • Introduction to chaos experiments for validating system resilience.

Hands-on Workshop: Operate a Sample Workload

  • Deploying a sample application using IaC and a CI/CD pipeline.
  • Implementing monitoring, alerts, and automated remediation scripts.
  • Simulating incidents and practicing runbook-based response.

Summary and Next Steps

Requirements

  • Foundational understanding of cloud computing concepts and networking.
  • Familiarity with Linux command line interfaces and scripting.
  • Practical experience with version control systems (Git) and basic CI/CD concepts.

Target Audience

  • Cloud operations engineers.
  • Site Reliability Engineers (SREs) and platform engineers.
  • DevOps engineers and technical team leads.
 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories