Administration Guides
Grafana Service Status
Run the following command to get the service status:
kubectl -n kosmos-monitoring get pod -l app.kubernetes.io/name=grafana
The Grafana pod should be in the Running state:
NAME READY STATUS RESTARTS AGE
monitoring-stack-grafana-84cc5689b7-mkzxv 3/3 Running 0 49d
- If the pod is not in a Running status or is missing, go to step 2
- If the pod is in a Running status, but not Ready, the pod has started but is not functioning correctly.
If the pod is not ready, go to step 3.
Check the pod status
If you encounter any problems, you can view the pod status with this command, replacing grafana-xx with the pod name.
kubectl -n kosmos-monitoring describe pod monitoring-stack-grafana-xxx`
If the pod is not present, check the deployment events.
kubectl -n kosmos-monitoring describe deploy monitoring-stack-grafana`
[...]
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: monitoring-stack-grafana-6fdcbbf54d (0/0 replicas created)
NewReplicaSet: monitoring-stack-grafana-84cc5689b7 (1/1 replicas created)
Events: <none>
Check Grafana Logs
If you encounter any problems, you can view the Grafana logs with this command, replacing grafana-xx with the pod name:
kubectl -n kosmos-monitoring logs monitoring-stack-grafana-xxx
logger=provisioning.dashboard t=2025-04-18T12:48:21.794426177Z level=info msg="starting to provision dashboards"
logger=plugin.angulardetectorsprovider.dynamic t=2025-04-18T12:48:22.105002743Z level=info msg="Patterns update finished" duration=10.130783177s
logger=provisioning.dashboard t=2025-04-18T12:48:22.31150119Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.340735638Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.523781174Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.551891447Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.739913466Z level=info msg="finished to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:22.768174451Z level=info msg="starting to provision dashboards"
logger=provisioning.dashboard t=2025-04-18T12:48:23.130013568Z level=info msg="finished to provision dashboards"
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:33.717172457Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.161297ms size=40 referer= handler=/api/live/ws status_source=server
logger=context userId=0 orgId=0 uname= t=2025-04-18T12:48:40.6325298Z level=info msg="Request Completed" method=GET path=/api/live/ws status=401 remote_addr=10.2.0.140 time_ms=1 duration=1.121472ms size=40 referer= handler=/api/live/ws status_source=server
Search for the HTTP Server Listen pattern:
kubectl -n kosmos-monitoring logs monitoring-stack-grafana-84cc5689b7-crt9z | grep "HTTP"
logger=http.server t=2025-04-18T12:48:10.802578361Z level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket=
Check the Grafana GUI
Verify that you can log in to Grafana via the administration portal.
Role Association for SSO users
SSO users may connect to Grafana and the following mapping is done between the IDP groups and the application roles :
adminsysteme:adminadminsecurite:adminadmininfra:admindataing:editor
For more information on how to create a new SSO user and How to make it join a group, see here.
Technical Description
Location
The services deployed in Kubernetes are:
- namespace: kosmos-monitoring
- pod: alertmanager-monitoring-stack-kube-prom-alertmanager-0
- pod: monitoring-stack-grafana-*
- pod: monitoring-stack-kube-prom-operator-*
- pod: monitoring-stack-kube-state-metrics-*
- pod: prometheus-monitoring-stack-kube-prom-prometheus-0
Prometheus
This data collection system requires significant RAM to process all the metrics it needs to collect. The required amount should be adjusted to suit your system.
Specific Configuration
For Prometheus, a ReadinessProbe is configured, calling the URL http://localhost:9090/prometheus/-/ready. If this script does not return an HTTP 200 status code, the user interface will be inaccessible.
Key Files
Configuration Files
| Name | Path | Brief Description |
|---|---|---|
| prometheus.yml | /etc/prometheus/ | Prometheus configuration file |
Certificates
| Name | Path | Brief Description |
|---|---|---|
| prometheus-tls | secret | secret ingress prometheus-api.supervision.artemis |
Binaries
| Name | Brief Description |
|---|---|
| quay.io/prometheus/prometheus:v3.1.0 | standard prometheus image |
Grafana
For Grafana, a readiness probe is configured calling the URL http://localhost:3000/api/health. If this script does not return an HTTP 200 status code, then the user interface will not be accessible.
Key Files
Configuration Files
| Name | Path | Brief Description |
|---|---|---|
| grafana.ini | /etc/grafana/grafana.ini | Grafana configuration file |
| /etc/grafana/dashboards | /etc/grafana/dashboards | Grafana dashboards |
| /etc/grafana/provisioning/datasources | /etc/grafana/provisioning/datasources | datasources |
Binaries
| Name | Brief Description |
|---|---|
| registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0 | Kube-state-metrics standard image |
| hosted-registry.corp.athea/grafana/grafana:11.4.0-athea | standard image grafana custom ARTEMIS |
Combination of Failures
- The proper functioning of the service relies on the Kubernetes cluster's ability to provision PODs and provide access to them via services. If the Kubernetes cluster fails, the service may be unavailable.
- The proper functioning of the metrics service relies on internal cluster reporting via kube-state-metrics. If these services fail, the service will be available, but it will lack up-to-date metrics, and queries will be performed on outdated metrics. (e.g., the state of a POD that has changed but has not been reported, etc.)
- The proper functioning of Grafana dashboards relies on various data sources (ClickHouse, Prometheus). If these data sources fail, this service will not function correctly.