Site Reliability Engineering

Overview

Site Reliability Engineering (SRE) refers to the application of software engineering practices to the management of IT infrastructure and operations. This SRE training course is aimed at technical persons who wish to apply software engineering tools and techniques to more efficiently manage an IT system.

 

Objectives

  • Understand what Site recovery manager is
  • Greater understanding of Practices and Principles of SRE
  • What is DevOps
  • Differentiate between DevOps and SRE
  • Understand various tools used in Automation
  • Understand various tools used in software build and release
  • Gain hands-on experience on Jenkins, Docker, Terraform Kubernetes, and Ansible

 

Audience

  • Business Managers and Stake Holders
  • Change Agents and Consultants
  • DevOps Practitioners
  • IT Directors, Managers and Team Leaders
  • Product Owners
  • Scrum Masters
  • Software Engineers
  • Site Reliability Engineers
  • System Integrators
  • Tool Providers
  • Developers
  • System administrators
  • Software Architects
  • DevOps Engineers
  • IT Managers

 

Pre- requisites 

  • A general understanding of IT infrastructure.
  • A general idea of the software development process.
  • Programming or scripting experience in any language.

 

Duration: 3 days – 21 hrs

 

Course Content

Module 1: SRE -Big Picture

  • History of Site Reliability Engineering
  • Introduction to SRE
  • Define Site Reliability Engineering (SRE)
  • DevOps and SRE differences

 

Module 2.A:  Principles of SRE

  • Embracing Risk
  • Service Level Objectives
  • Eliminating Toil
  • Monitoring Distributed Systems
  • The Evolution of Automation at Google
  • Release Engineering
  • Simplicity

 

Module 2.B Hands-on Lab – Before DevOps scenario labs

  • Create repository on Bitbucket
  • Git clone, install maven
  • Perform manual package
  • Deploy application

 

Module 3:  Practices in SRE – Part 1

  • Practical Alerting
  • Being On-Call
  • Effective Troubleshooting
  • Emergency Response
  • Managing Incidents
  • Postmortem Culture: Learning from Failure
  • Tracking Outages
  • Testing for Reliability
  • Software Engineering in SRE

 

Module 4:  Practices in SRE – Part 2

  • Load Balancing at the Frontend
  • Load Balancing in the Datacenter
  • Handling Overload
  • Addressing Cascading Failures
  • Managing Critical State: Distributed Consensus for Reliability
  • Distributed Periodic Scheduling with Cron
  • Data Processing Pipelines
  • Data Integrity: What You Read Is What You Wrote
  • Reliable Product Launches at Scale

 

Module 5: Containerization and Microservices 

  • Monolithic application overview
  • Microservice overview and benefits
  • What is virtualization
  • What is containers
  • Virtualization and container differences
  • Kubernetes overview – orchestration of containers
  • Kubernetes architecture and Components

 

Module 5.B: Hands-on lab

  • Install docker
  • Create, Login stop and delete container
  • Create image using dockerfile
  • Push image to dockerhub
  • Deploy Kubernetes cluster on Google
  • Deploy your own docker image on Kubernetes
  • Expose application behind a load balancer

 

Module 6: DevOps Big Picture

  • Define Waterfall model and its challenges
  • Define Agile and its advantages
  • Define DevOps
  • Difference in between agile and DevOps
  • Continuous Integration and Continuous deployment
  • Before DevOps application development and delivery
  • After DevOps application development and delivery

 

Module 7: SRE and DevOps differences

  • Common myths around and SRE and DevOps are same
  • Key differences between SRE and DevOps

 

Module 8.A: SRE Developer Tool chain

  • Source code management tools
    • Github, bitbucket and SVN
  • Static code analysis
    • Sonarqube, Fortify, Nexus IQ
  • Build Tools
    • Maven, Ant and Gradle
  • Repository tools
    • Nexus, Artifactory, cloud storage
  • Orchestration Tools
    • Jenkins, Bamboo CI, Travis
  • Release management Tools
    • Jira Release management, Urban code release, BMC RLM

 

Module 8.B: Hands on lab

  • Create a CI/CD pipeline on Jenkins which automates below tasks
    • Git clone
    • mvn install
    • code analysis by sonarqube
    • Mvn compile and mvn package
    • Upload application package to Nexus
    • Deploy application on same machine

 

Module 9.A: SRE Operations Tool chain

  • Infrastructure-as-a-code tools – Terraform
  • Declarative infrastructure and Deployment tools
    • AWS Cloud formation
    • Google deployment Manager 
    • Azure resource manager
    • Openstack Heat
  • Ops Automation tools
    • Ansible – overview, architecture and components
    • Chef – overview, architecture and components
    • Puppet – overview, architecture and components
    • Saltstack – overview, architecture and components
  • Monitoring and ticketing tools
  • Application monitoring and tracing tools  
    • Newrelic
    • App Dynamics
    • DataDog
    • AWS-Xray
  • Infrastructure Monitoring Tools 
    • Nagios
    • ELK and EFK
  • Ticketing Tools 
  • Cloud native monitoring Tools
    • AWS cloudwatch
    • Google Stackdriver
    • Azure Monitor

 

Module 9.B: Hands on lab

  • Install terraform
  • Deploy Kubernetes cluster using terraform
  • Write Ansible scripts (playbooks and apply on nodes)
  • AWS Xray – application monitoring and tracing

Best selling courses

CLOUD COMPUTING

Enterprise Architecture

DATA SCIENCE

Tableau Basic

ARTIFICIAL INTELLIGENCE / MACHINE LEARNING / DEEP LEARNING

RPA with UiPath

PROGRAMMING / CODING

MATLAB Fundamentals