As in, when I watched YouTube tutorials, I often see YouTubers have a small widget on their desktop giving them an overview of their ram usage, security level, etc. What apps do you all use to track this?

  • Dogeek@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    Oh lord, I have so much info to give ! For the setup, it’s running on kubernetes 1.28.2, so YMMV. My monitoring stack is :

    • Grafana – Dashboards
    • Alertmanager – Alerting
    • Prometheus – Time series Database
    • Loki – Logs database
    • Promtail – Log collector
    • Mimir – Long term metrics&logs storage
    • Tempo – Datadog APM, but with Grafana, allows you to track requests through a network of services, invaluable to link your reverse proxy, to your apps, to your SSO to your database…
    • SMTP Relay – A homemade SMTP relay that eases setting up mail alerts, allows me to push mail through mailjet using my domain
    • Node-exporter – exports metrics for the server
    • Exportarr – exports metrics for sonarr/radarr etc
    • pihole-exporter – exports pihole metrics for prometheus scraping
    • smart-exporter – exports S.M.A.R.T metrics (for HDD health)
    • ntfy – for notifications to my phone (other than mail)

    The rest is pretty much the same, if the service exports prometheus metrics by default, I use that, and write a ServiceMonitor and a Service manifest for that, it usually looks like that

    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor
    metadata:
      name: traefik
      labels:
        app.kubernetes.io/component: traefik
        app.kubernetes.io/instance: traefik
        app.kubernetes.io/managed-by: kustomize
        app.kubernetes.io/name: traefik
        app.kubernetes.io/part-of: traefik
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/name: traefik-metrics
      endpoints:
      - port: metrics
        interval: 30s
        path: /metrics
        scheme: http
        tlsConfig:
          insecureSkipVerify: true
      namespaceSelector:
        matchNames:
        - traefik
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: traefik-metrics
      namespace: traefik
      labels:
        app.kubernetes.io/name: traefik-metrics
    spec:
      type: ClusterIP
      ports:
        - protocol: TCP
          name: metrics
          port: 8082
      selector:
        app.kubernetes.io/name: traefik
    

    If the app doesn’t include a prometheus endpoint, I just find an existing exporter for that app, most popular ones have that, and ready made grafana dashboards.

    For alerting, I create PrometheusRule object with the prometheus query and the message to alert me (depending on the severity, it’s either a mail for med-low severity incidents, phone notification for high sev). I try to keep mails / notifications to a minimum, just alerts on load, CPU, RAM, and potential SMART errors as well give me alerts.