Administration Guides

Grafana Service Status

Run the following command to get the service status:

kubectl -n kosmos-monitoring get pod -l app.kubernetes.io/name=grafana

The Grafana pod should be in the Running state:

NAME READY STATUS RESTARTS AGE
monitoring-stack-grafana-84cc5689b7-mkzxv 3/3 Running 0 49d

If the pod is not in a Running status or is missing, go to step 2
If the pod is in a Running status, but not Ready, the pod has started but is not functioning correctly.

If the pod is not ready, go to step 3.

Check the pod status

If you encounter any problems, you can view the pod status with this command, replacing grafana-xx with the pod name.

kubectl -n kosmos-monitoring describe pod monitoring-stack-grafana-xxx`

If the pod is not present, check the deployment events.

kubectl -n kosmos-monitoring describe deploy monitoring-stack-grafana`

[...]
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  monitoring-stack-grafana-6fdcbbf54d (0/0 replicas created)
NewReplicaSet:   monitoring-stack-grafana-84cc5689b7 (1/1 replicas created)
Events:          <none>

Check Grafana Logs

If you encounter any problems, you can view the Grafana logs with this command, replacing grafana-xx with the pod name:

kubectl -n kosmos-monitoring logs monitoring-stack-grafana-xxx

logger=provisioning.dashboard t=2025-04-18T12:48:21.794426177Z level=info msg="starting to provision dashboards"
logger=plugin.angulardetectorsprovider.dynamic t=2025-04-18T12:48:22.105002743Z level=info msg="Patterns update finished" duration=10.130783177s
logger=provisioning.dashboard t=2025-04-18T12:48:22.31150119Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.340735638Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.523781174Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.551891447Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.739913466Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.768174451Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:23.130013568Z level=info msg="finished to provision dashboards"
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:33.717172457Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.161297ms size=40 referer= handler=/api/live/ws status_source=server
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:40.6325298Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.121472ms size=40 referer= handler=/api/live/ws status_source=server

Search for the HTTP Server Listen pattern:

kubectl -n kosmos-monitoring logs monitoring-stack-grafana-84cc5689b7-crt9z | grep "HTTP"
logger=http.server t=2025-04-18T12:48:10.802578361Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=

Check the Grafana GUI

Verify that you can log in to Grafana via the administration portal.

Role Association for SSO users

SSO users may connect to Grafana and the following mapping is done between the IDP groups and the application roles :

adminsysteme : admin
adminsecurite : admin
admininfra : admin
dataing : editor

For more information on how to create a new SSO user and How to make it join a group, see here.

Technical Description

Location

The services deployed in Kubernetes are:

namespace: kosmos-monitoring
pod: alertmanager-monitoring-stack-kube-prom-alertmanager-0
pod: monitoring-stack-grafana-*
pod: monitoring-stack-kube-prom-operator-*
pod: monitoring-stack-kube-state-metrics-*
pod: prometheus-monitoring-stack-kube-prom-prometheus-0

Prometheus

This data collection system requires significant RAM to process all the metrics it needs to collect. The required amount should be adjusted to suit your system.

Specific Configuration

For Prometheus, a ReadinessProbe is configured, calling the URL http://localhost:9090/prometheus/-/ready. If this script does not return an HTTP 200 status code, the user interface will be inaccessible.

Key Files

Configuration Files

Name	Path	Brief Description
prometheus.yml	/etc/prometheus/	Prometheus configuration file

Certificates

Name	Path	Brief Description
prometheus-tls	secret	secret ingress prometheus-api.supervision.artemis

Binaries

Name	Brief Description
quay.io/prometheus/prometheus:v3.1.0	standard prometheus image

Grafana

For Grafana, a readiness probe is configured calling the URL http://localhost:3000/api/health. If this script does not return an HTTP 200 status code, then the user interface will not be accessible.

Key Files

Configuration Files

Name	Path	Brief Description
grafana.ini	/etc/grafana/grafana.ini	Grafana configuration file
/etc/grafana/dashboards	/etc/grafana/dashboards	Grafana dashboards
/etc/grafana/provisioning/datasources	/etc/grafana/provisioning/datasources	datasources

Binaries

Name	Brief Description
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0	Kube-state-metrics standard image
hosted-registry.corp.athea/grafana/grafana:11.4.0-athea	standard image grafana custom ARTEMIS

Combination of Failures

The proper functioning of the service relies on the Kubernetes cluster's ability to provision PODs and provide access to them via services. If the Kubernetes cluster fails, the service may be unavailable.
The proper functioning of the metrics service relies on internal cluster reporting via kube-state-metrics. If these services fail, the service will be available, but it will lack up-to-date metrics, and queries will be performed on outdated metrics. (e.g., the state of a POD that has changed but has not been reported, etc.)
The proper functioning of Grafana dashboards relies on various data sources (ClickHouse, Prometheus). If these data sources fail, this service will not function correctly.

Grafana Service Status​

Check the pod status​

Check Grafana Logs​

Check the Grafana GUI​

Role Association for SSO users​

Technical Description​

Location​

Prometheus​

Specific Configuration​

Key Files​

Configuration Files​

Certificates​

Binaries​

Grafana​

Key Files​

Configuration Files​

Binaries​

Combination of Failures​

Grafana Service Status

Check the pod status

Check Grafana Logs

Check the Grafana GUI

Role Association for SSO users

Technical Description

Location

Prometheus

Specific Configuration

Key Files

Configuration Files

Certificates

Binaries

Grafana

Key Files

Configuration Files

Binaries

Combination of Failures