Aller au contenu principal

Administration Guides

Grafana Service Status

Run the following command to get the service status:

kubectl -n kosmos-monitoring get pod -l app.kubernetes.io/name=grafana

The Grafana pod should be in the Running state:

NAME READY STATUS RESTARTS AGE
monitoring-stack-grafana-84cc5689b7-mkzxv 3/3 Running 0 49d
  • If the pod is not in a Running status or is missing, go to step 2
  • If the pod is in a Running status, but not Ready, the pod has started but is not functioning correctly.

If the pod is not ready, go to step 3.

Check the pod status

If you encounter any problems, you can view the pod status with this command, replacing grafana-xx with the pod name.

kubectl -n kosmos-monitoring describe pod monitoring-stack-grafana-xxx`

If the pod is not present, check the deployment events.

kubectl -n kosmos-monitoring describe deploy monitoring-stack-grafana`

[...]
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: monitoring-stack-grafana-6fdcbbf54d (0/0 replicas created)
NewReplicaSet: monitoring-stack-grafana-84cc5689b7 (1/1 replicas created)
Events: <none>

Check Grafana Logs

If you encounter any problems, you can view the Grafana logs with this command, replacing grafana-xx with the pod name:

kubectl -n kosmos-monitoring logs monitoring-stack-grafana-xxx
logger=provisioning.dashboard t=2025-04-18T12:48:21.794426177Z level=info msg="starting to provision dashboards"
logger=plugin.angulardetectorsprovider.dynamic t=2025-04-18T12:48:22.105002743Z level=info msg="Patterns update finished" duration=10.130783177s
logger=provisioning.dashboard t=2025-04-18T12:48:22.31150119Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.340735638Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.523781174Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.551891447Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.739913466Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.768174451Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:23.130013568Z level=info msg="finished to provision dashboards"
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:33.717172457Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.161297ms size=40 referer= handler=/api/live/ws status_source=server
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:40.6325298Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.121472ms size=40 referer= handler=/api/live/ws status_source=server

Search for the HTTP Server Listen pattern:

kubectl -n kosmos-monitoring logs monitoring-stack-grafana-84cc5689b7-crt9z | grep "HTTP"
logger=http.server t=2025-04-18T12:48:10.802578361Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=

Check the Grafana GUI

Verify that you can log in to Grafana via the administration portal.

Role Association for SSO users

SSO users may connect to Grafana and the following mapping is done between the IDP groups and the application roles :

  • adminsysteme : admin
  • adminsecurite : admin
  • admininfra : admin
  • dataing : editor

For more information on how to create a new SSO user and How to make it join a group, see here.

Technical Description

Location

The services deployed in Kubernetes are:

  • namespace: kosmos-monitoring
  • pod: alertmanager-monitoring-stack-kube-prom-alertmanager-0
  • pod: monitoring-stack-grafana-*
  • pod: monitoring-stack-kube-prom-operator-*
  • pod: monitoring-stack-kube-state-metrics-*
  • pod: prometheus-monitoring-stack-kube-prom-prometheus-0

Prometheus

This data collection system requires significant RAM to process all the metrics it needs to collect. The required amount should be adjusted to suit your system.

Specific Configuration

For Prometheus, a ReadinessProbe is configured, calling the URL http://localhost:9090/prometheus/-/ready. If this script does not return an HTTP 200 status code, the user interface will be inaccessible.

Key Files
Configuration Files
NamePathBrief Description
prometheus.yml/etc/prometheus/Prometheus configuration file
Certificates
NamePathBrief Description
prometheus-tlssecretsecret ingress prometheus-api.supervision.artemis
Binaries
NameBrief Description
quay.io/prometheus/prometheus:v3.1.0standard prometheus image

Grafana

For Grafana, a readiness probe is configured calling the URL http://localhost:3000/api/health. If this script does not return an HTTP 200 status code, then the user interface will not be accessible.

Key Files
Configuration Files
NamePathBrief Description
grafana.ini/etc/grafana/grafana.iniGrafana configuration file
/etc/grafana/dashboards/etc/grafana/dashboardsGrafana dashboards
/etc/grafana/provisioning/datasources/etc/grafana/provisioning/datasourcesdatasources
Binaries
NameBrief Description
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0Kube-state-metrics standard image
hosted-registry.corp.athea/grafana/grafana:11.4.0-atheastandard image grafana custom ARTEMIS

Combination of Failures

  • The proper functioning of the service relies on the Kubernetes cluster's ability to provision PODs and provide access to them via services. If the Kubernetes cluster fails, the service may be unavailable.
  • The proper functioning of the metrics service relies on internal cluster reporting via kube-state-metrics. If these services fail, the service will be available, but it will lack up-to-date metrics, and queries will be performed on outdated metrics. (e.g., the state of a POD that has changed but has not been reported, etc.)
  • The proper functioning of Grafana dashboards relies on various data sources (ClickHouse, Prometheus). If these data sources fail, this service will not function correctly.