Observability Platform Engineering for Platform & Infrastructure Systems Support

Course Description

Duration 5 days – 35 hrs

Overview

This course provides a comprehensive understanding of building and supporting observability platforms essential for maintaining system reliability, uptime, and performance in modern IT environments. Participants will explore the three pillars of observability—metrics, logs, and traces—while learning to deploy and operate popular open-source and enterprise-grade tools like Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and ELK Stack. The course emphasizes hands-on implementation, platform integration, dashboarding, alerting, and real-time troubleshooting.

Objectives

Understand the principles of observability and its role in infrastructure support.
Deploy and configure metrics collection systems (e.g., Prometheus, Node Exporter).
Implement logging pipelines using ELK (Elasticsearch, Logstash, Kibana) and Loki.
Set up distributed tracing using OpenTelemetry and Tempo/Jaeger.
Visualize infrastructure health using Grafana dashboards.
Configure alerts and incident response pipelines.
Integrate observability tools into cloud and Kubernetes environments.
Perform root cause analysis (RCA) and capacity planning using observability data.

Audience

Platform Engineers
Infrastructure Support Specialists
DevOps and SRE Teams
System and Cloud Administrators
Network Operations Center (NOC) Teams
Monitoring and Alerting Engineers

Pre-requisites

Basic Linux system administration

Understanding of infrastructure components (CPU, memory, disk, network)

Familiarity with containers, VMs, or cloud infrastructure

Basic experience with YAML and shell scripting (recommended)

Content

Module 1: Introduction to Observability

Observability vs. Monitoring
The Three Pillars: Metrics, Logs, Traces
Why Observability Matters in Platform Support
Tool Landscape Overview (Prometheus, Grafana, ELK, Loki, Tempo)

Module 2: Metrics Collection and Analysis

Introduction to Prometheus
Node Exporter and Application Exporters
Service Discovery, Pull vs. Push Models
Grafana Integration for Metrics
Hands-on: Deploying Prometheus + Grafana for Server Monitoring

Module 3: Centralized Logging Systems

Architecture of ELK Stack and Loki
Ingesting Logs from Linux, Docker, Kubernetes
Structuring Logs for Query and Analysis
Visualizing Logs in Kibana or Grafana Loki
Hands-on: Deploying Loki or ELK Stack and Searching Logs

Module 4: Distributed Tracing and OpenTelemetry

Introduction to Tracing Concepts
Tracers, Spans, and Context Propagation
OpenTelemetry: Unified Collection Framework
Tempo vs. Jaeger for Trace Storage and Visualization
Hands-on: Tracing a Sample App and Viewing in Grafana Tempo

Module 5: Dashboards, Alerting, and Notifications

Building Grafana Dashboards for Infra & App Health
Alertmanager and Notification Channels (Email, Slack, etc.)
Threshold-based and Behavior-based Alerting
Hands-on: Configuring Dashboards and Alert Rules

Module 6: Observability in Kubernetes and Cloud

Monitoring Pods and Nodes with Prometheus Operator
Logging and Tracing with Fluent Bit, Loki, and OpenTelemetry Collector
Observability in AWS CloudWatch, Azure Monitor, GCP Stackdriver
Hands-on: Deploying Observability Stack in a K8s Cluster

Module 7: Root Cause Analysis and Performance Insights

Investigating Incidents with Metrics and Logs
Tracing User Requests Across Services
Using Observability Data for Capacity Planning
Hands-on: Performing RCA with a Simulated Outage

Module 8: Scaling and Maintaining Observability Platforms

Scaling Prometheus and Long-Term Storage Options
Centralized Logging Optimization (Indices, Retention)
Securing Observability Data (RBAC, HTTPS, Token Access)
Best Practices for Maintenance and Upgrades

Inquire now

Best selling courses

PROJECT MANAGEMENT

Portfolio Management for the Banking Industry

Duration 3 days – 21 hrs Overview This Portfolio Management Training Course is designed to provide banking professionals with a comprehensive understanding of how to effectively manage investment and credit portfolios. Participants will gain insights into strategic allocation, performance measurement, risk management, and optimization of banking portfolios to align with regulatory requirements and...

Inquire Now

LOGISTICS

Planning and Forecasting

Duration 2 days – 14 hrs Overview This comprehensive Planning and Forecasting Training Course is designed to empower professionals with the tools and techniques necessary to accurately predict future outcomes and develop strategic, operational, and financial plans. The course provides a structured approach to planning and forecasting, integrating both qualitative and quantitative methods....

Inquire Now

DATABASE

PostgreSQL Essentials to Practitioner: Beginner-to-Intermediate SQL & Database Administration

Duration 3 days – 21 hours Overview This Beginner-to-Intermediate PostgreSQL Training Course is designed to build strong foundational skills in PostgreSQL while preparing participants to confidently work with real-world database tasks in modern environments. Participants will learn how PostgreSQL works, how to write efficient SQL queries, how to design and manage database...

Inquire Now

RISK MANAGEMENT

Liquidity Risk Management

Duration 5 days – 35 hrs Overview. This Liquidity Risk Management Training Course is tailored for banking professionals in the Philippines, focusing on the skills and knowledge necessary to manage liquidity risk effectively. Participants will learn how to assess liquidity risk, apply regulatory standards, and develop strategies to maintain adequate cash flow and...

Inquire Now

PROJECT MANAGEMENT

PMO Project Management Office Leadership & Strategic Transformation (Advanced)

Duration 5 days – 35 hrs Overview This 5-day advanced training course is designed for senior PMO leaders, program managers, PMO directors, and executives aiming to enhance their leadership capabilities and transform their PMOs into strategic business drivers. The course will explore advanced concepts in PMO strategy, digital transformation, innovation, business case development,...

Inquire Now

TRAINOSYS CUSTOMIZED COURSE

Data Analytics from SQL to Power BI

The “Data Analytics from SQL to Power BI” training course is a comprehensive program designed to equip participants with the knowledge and skills necessary to analyze and visualize data using SQL and Power BI. Over the course of five days, participants will learn essential data analytics concepts, master SQL querying techniques for data retrieval and...

Inquire Now

CYBER SECURITY

Anti-Money Laundering Act and Counterfeit Money: Compliance and Detection Training (Philippines Focus)

Duration 2 days – 14 hrs Overview This course provides a comprehensive understanding of the Anti-Money Laundering Act (AMLA) of the Philippines and techniques for identifying and handling counterfeit money. It equips participants with the knowledge to detect suspicious transactions, fulfill AML compliance obligations, and mitigate financial crime risks. Real-world case studies, regulatory...

Inquire Now

BUSINESS INTELLIGENCE

Introduction to Data Visualization & Dashboards

Duration 2 days – 14 hrs Overview This course introduces participants to the principles and tools of data visualization and dashboard design. It focuses on transforming raw data into compelling, clear, and actionable visuals that support decision-making. Participants will explore visualization best practices, storytelling techniques, and hands-on tools (such as Excel, Power BI,...

Inquire Now

Observability Platform Engineering for Platform & Infrastructure Systems Support

Related Courses

Best selling courses

Training Inquiry Information

Login