Kubernetes monitor using Prometheus and Thanos , (1)Architecture

5 min readNov 25, 2021

Introduction

The need for Prometheus High Availability

Monitoring is an essential for any infrastructure. It widely considers Prometheus an excellent choice for monitoring, both containerized and non-containerized workloads. However, the monitoring components should be highly-available and scalable in order to meet the needs of a growing infrastructure, especially in Kubernetes.

Single Prometheus will not provide resilience due to node failure. Clustered Prometheus will lead to data inconsistency due to multiple copy of data will be distributed to each Prometheus pods. Therefore, we need to ensure appropriate data archiving for the metrics references.

Also, Prometheus is heavily relied on memory due to the TSDB block sync. Scale in Prometheus will cause unnecessary CPU/MEM usage in your application cluster. After all, query in Prometheus will lead to sync from all Prometheus pods.

Solution

In our case, we will choose Azure object storage to achieve data(you may choose different object storage on your own discretion), Thanos to query stored data.

Also, we will use Grafana for metric visualization.

For better management for above components in Kubernetes, also for simplify deployment, I will use helm to deployment above software(Prometheus, Grafana, Thanos), including their components. You may use helm template command to export yaml files to modify in order to suit your need.

Detailed deploy steps is shared in this article: https://alphatango086.medium.com/kubernetes-monitor-using-prometheus-and-thanos-2-deployment-c848d03176ba

What is Thanos?

Not the Marvel comic character!

Thanos is an open source project for Prometheus integration, simply put, is a “highly available Prometheus setup with long-term storage capabilities”. It aggerate Prometheus metrics to object data for persistence. Also provide a query API for Prometheus compatible query so as to reduce the load from Prometheus. So your Prometheus can scale at any level seamlessly.

Why Thanos?

Thanos can sync prometheus metric block from local drive to object storage for persistence.
The persist block can allow historical and global query from Thanos Query component.
Thanos can compact and downsample data, so as to support multiple cluster and big data.
Faster Query and distribute Query pressure from Proetheus to Thanos, reduce application cluster usage for non-product services.

There are two types of deploy for Thanos.

I will here by use Thano with Sidecar.

Architecture:

In design, Prometheus in each cluster will upload any metrics that older than 1 day or flush the block once over 50 GB to storage account by Thanos sidecar. And Thanos Storage Gateway will sync blocks from the blob for Thanos Query. Thanos Compactor will compact data in blob. Grafana will get all Prometheus metrics from Thanos Query Frontend and expose to internet via Istio.

I’m not using Thanos Ruler or Thanos Receive. for the first, I use alert manager as replace. Because that will be integrated in kube-prometheus-stack helm chart and we only need realtime alert(I suppose no one need alerts to tell you that your home got fire after 3 days.), so no need to query from Thanos.

I this architecture, most of the load(Thanos Query, Thanos Storage Gateway, Thanos Compactor) will be in the manage cluster and will not affect app cluster’s performance. Prometheus can be scale at any level to scrape metrics. Grafana can configure data source to each Prometheus as well.

Infra Diagram:

I built up my kubernetes clusters in Azure and use Isito as service mesh controller. With Istio multicluster deployment, I can expose prometheus as internal service instead of using ingress. That will make the deployment with less pain and more flexibility.

About this infra deployment, You may refer this document from me: https://alphatango086.medium.com/istio-service-mesh-with-multiple-cluster-8637bf6b9ea8

Monitor Services Diagram:

Components:

Prometheus: Metrics collector, scape metrics from various exporter. Retain certain days’ metrics(1day in my case), or certain size of metrics(50GB in my case).

Grafana: Dashboard for visualization. Data sources from Prometheus(recent metrics), or Thanos(Down sampled and historical metrics)

Alertmanager: I will not implement Thanos ruler for firing alerts because Prometheus Alertmanager is enough. And I don’t think alerts for historical metrics is needed.

Thanos Query: Query metrics from underlying StoreAPIs.

Thanos Query FrontEnd: Service implements a service that can be put in front of Thanos Queriers to improve the read path.

Thanos Store Gateway: The store component of Thanos implements the Store API on top of historical data in an object storage bucket. This component will consume CPU and memory in a considerable level and increase with your monitor scale.

Thanos Compactor: The component of Thanos applies the compaction procedure to block data stored in object storage.

Thanos Bucket Web(Optional): The bucket component of Thanos is a set of commands to inspect data in object storage buckets.

Minio: A High Performance Object Storage. It is API compatible with Amazon S3 cloud storage service. I used Minio as object storage gateway for Azure file share. You can use Minio to build your own object storage or configure other object storage.

Azure File Share: A object storage.

In the next Chapter, I will introduce how to deploy the monitor services. Please clap and share this article if you like it, thanks.