Duration: 3 days – 21 hrs
Overview
Site Reliability Engineering (SRE) refers to the application of software engineering practices to the management of IT infrastructure and operations. This SRE training course is aimed at technical persons who wish to apply software engineering tools and techniques to more efficiently manage an IT system.
Objectives
- Understand what Site recovery manager is
- Greater understanding of Practices and Principles of SRE
- What is DevOps
- Differentiate between DevOps and SRE
- Understand various tools used in Automation
- Understand various tools used in software build and release
- Gain hands-on experience on Jenkins, Docker, Terraform Kubernetes, and Ansible
Audience
- Business Managers and Stake Holders
- Change Agents and Consultants
- DevOps Practitioners
- IT Directors, Managers and Team Leaders
- Product Owners
- Scrum Masters
- Software Engineers
- Site Reliability Engineers
- System Integrators
- Tool Providers
- Developers
- System administrators
- Software Architects
- DevOps Engineers
- IT Managers
Pre- requisites
- A general understanding of IT infrastructure.
- A general idea of the software development process.
- Programming or scripting experience in any language.
Course Content
Module 1: SRE -Big Picture
- History of Site Reliability Engineering
- Introduction to SRE
- Define Site Reliability Engineering (SRE)
- DevOps and SRE differences
Module 2.A: Principles of SRE
- Embracing Risk
- Service Level Objectives
- Eliminating Toil
- Monitoring Distributed Systems
- The Evolution of Automation at Google
- Release Engineering
- Simplicity
Module 2.B Hands-on Lab – Before DevOps scenario labs
- Create repository on Bitbucket
- Git clone, install maven
- Perform manual package
- Deploy application
Module 3: Practices in SRE – Part 1
- Practical Alerting
- Being On-Call
- Effective Troubleshooting
- Emergency Response
- Managing Incidents
- Postmortem Culture: Learning from Failure
- Tracking Outages
- Testing for Reliability
- Software Engineering in SRE
Module 4: Practices in SRE – Part 2
- Load Balancing at the Frontend
- Load Balancing in the Datacenter
- Handling Overload
- Addressing Cascading Failures
- Managing Critical State: Distributed Consensus for Reliability
- Distributed Periodic Scheduling with Cron
- Data Processing Pipelines
- Data Integrity: What You Read Is What You Wrote
- Reliable Product Launches at Scale
Module 5: Containerization and Microservices
- Monolithic application overview
- Microservice overview and benefits
- What is virtualization
- What is containers
- Virtualization and container differences
- Kubernetes overview – orchestration of containers
- Kubernetes architecture and Components
Module 5.B: Hands-on lab
- Install docker
- Create, Login stop and delete container
- Create image using dockerfile
- Push image to dockerhub
- Deploy Kubernetes cluster on Google
- Deploy your own docker image on Kubernetes
- Expose application behind a load balancer
Module 6: DevOps Big Picture
- Define Waterfall model and its challenges
- Define Agile and its advantages
- Define DevOps
- Difference in between agile and DevOps
- Continuous Integration and Continuous deployment
- Before DevOps application development and delivery
- After DevOps application development and delivery
Module 7: SRE and DevOps differences
- Common myths around and SRE and DevOps are same
- Key differences between SRE and DevOps
Module 8.A: SRE Developer Tool chain
- Source code management tools
- Github, bitbucket and SVN
- Static code analysis
- Sonarqube, Fortify, Nexus IQ
- Build Tools
- Maven, Ant and Gradle
- Repository tools
- Nexus, Artifactory, cloud storage
- Orchestration Tools
- Jenkins, Bamboo CI, Travis
- Release management Tools
- Jira Release management, Urban code release, BMC RLM
Module 8.B: Hands on lab
- Create a CI/CD pipeline on Jenkins which automates below tasks
- Git clone
- mvn install
- code analysis by sonarqube
- Mvn compile and mvn package
- Upload application package to Nexus
- Deploy application on same machine
Module 9.A: SRE Operations Tool chain
- Infrastructure-as-a-code tools – Terraform
- Declarative infrastructure and Deployment tools
- AWS Cloud formation
- Google deployment Manager
- Azure resource manager
- Openstack Heat
- Ops Automation tools
- Ansible – overview, architecture and components
- Chef – overview, architecture and components
- Puppet – overview, architecture and components
- Saltstack – overview, architecture and components
- Monitoring and ticketing tools
- Application monitoring and tracing tools
- Newrelic
- App Dynamics
- DataDog
- AWS-Xray
- Infrastructure Monitoring Tools
- Nagios
- ELK and EFK
- Ticketing Tools –
- Cloud native monitoring Tools
- AWS cloudwatch
- Google Stackdriver
- Azure Monitor
Module 9.B: Hands on lab
- Install terraform
- Deploy Kubernetes cluster using terraform
- Write Ansible scripts (playbooks and apply on nodes)
- AWS Xray – application monitoring and tracing