pve-manager

proxmox-mirrors/pve-manager

Fork 0

mirror of https://git.proxmox.com/git/pve-manager synced 2025-08-13 02:25:32 +00:00

Commit Graph

Author	SHA1	Message	Date
Lukas Wagner	4d0f2624de	pull metric: fix node iowait metric The hash from which we query cpu metrics contains 'iowait' as well as 'wait'. The first one is the total amount of time that was spent waiting on IO, the second one is the percentage of time spent on waiting on IO in a certain time frame. For the metrics returned by the /cluster/metrics/export endpoint we want the second one. Reported-by: Dominik Csapak <d.csapak@proxmox.com> Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>	2025-02-10 10:43:55 +01:00
Lukas Wagner	073b53ae71	metrics: add /cluster/metrics/export endpoint This new endpoint returns node, storage and guest metrics in JSON format. The endpoint supports history/max-age parameters, allowing the caller to query the recent metric history as recorded by the PVE::PullMetric module. The returned data format is quite simple, being an array of metric records, including a value, a metric name, an id to identify the object (e.g. qemu/100, node/foo), a timestamp and a type ('gauge', 'derive', ...). The latter property makes the format self-describing and aids the metric collector in choosing a representation for storing the metric data. [ ... { "metric": "cpu_avg1", "value": 0.12, "timestamp": 170053205, "id": "node/foo", "type": "gauge" }, ... ] Some experiments were made in regards to making the format more 'efficient', e.g. by grouping based on timestamps/ids, resulting in a much more nested/complicated data format. While that certainly reduces the size of the raw JSON response by quite a bit, after GZIP compression the differences are negligible (the simple, flat data format as described above compresses by a factor of 25 for large clusters!). Also, the slightly increased CPU load of compressing the larger amount of data when e.g. polling once a minute is so small that it's indistinguishable from noise in relation to a usual hypervisor workload. Thus the simpler, format was chosen. One benefit of this format is that it is more or less already the exact same format as the one Prometheus uses, but in JSON format - so adding a Prometheus metric scraping endpoint should not be much work at all. The API endpoint collects metrics for the whole cluster by calling the same endpoint for all cluster nodes. To avoid endless request recursion, the 'local-only' request parameter is provided. If this parameter is set, the endpoint implementation will only return metrics for the local node, avoiding a loop. Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> [WB: remove unused $start_time leftover from benchmarks] Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2024-08-14 14:18:48 +02:00
Lukas Wagner	5732ad6584	pvestatd: store subsystem status data in a shared cache This commit adds a new module PVE::PullMetric. This module allows us to store the status data of various subsystems, including status data for the most recent pvestatd update loops. Right now, we store 6 old generations - including the most recent values, that gives 70 seconds of stat history (based on a 10 second pvestatd update loop interval). This cache allows us to add support for pull-style metric collection systems, be it Prometheus/OpenMetrics or some custom, JSON based metric format. This patch raises the required lib{proxmox,pve}-perl-rs version requirements, since we need the new bindings for proxmox-shared-cache. Signed-off-by: Lukas Wagner <l.wagner@proxmox.com> [WB: actually bump runtime deps in d/control] Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>	2024-08-14 14:18:34 +02:00

Author

SHA1

Message

Date

Lukas Wagner

4d0f2624de

pull metric: fix node iowait metric

The hash from which we query cpu metrics contains 'iowait' as well as
'wait'. The first one is the total amount of time that was spent
waiting on IO, the second one is the percentage of time spent on waiting
on IO in a certain time frame.

For the metrics returned by the /cluster/metrics/export endpoint we want
the second one.

Reported-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>

2025-02-10 10:43:55 +01:00

Lukas Wagner

073b53ae71

metrics: add /cluster/metrics/export endpoint

This new endpoint returns node, storage and guest metrics in JSON
format. The endpoint supports history/max-age parameters, allowing
the caller to query the recent metric history as recorded by the
PVE::PullMetric module.

The returned data format is quite simple, being an array of
metric records, including a value, a metric name, an id to identify
the object (e.g. qemu/100, node/foo), a timestamp and a type
('gauge', 'derive', ...). The latter property makes the format
self-describing and aids the metric collector in choosing a
representation for storing the metric data.

    [
        ...
        {
            "metric": "cpu_avg1",
            "value": 0.12,
            "timestamp": 170053205,
            "id": "node/foo",
            "type": "gauge"
        },
        ...
    ]

Some experiments were made in regards to making the format
more 'efficient', e.g. by grouping based on timestamps/ids, resulting
in a much more nested/complicated data format. While that
certainly reduces the size of the raw JSON response by quite a bit,
after GZIP compression the differences are negligible (the
simple, flat data format as described above compresses by a factor
of 25 for large clusters!). Also, the slightly increased CPU load
of compressing the larger amount of data when e.g. polling once a
minute is so small that it's indistinguishable from noise in relation
to a usual hypervisor workload. Thus the simpler, format was
chosen. One benefit of this format is that it is more or less already
the exact same format as the one Prometheus uses, but in JSON format -
so adding a Prometheus metric scraping endpoint should not be much
work at all.

The API endpoint collects metrics for the whole cluster by calling
the same endpoint for all cluster nodes. To avoid endless request
recursion, the 'local-only' request parameter is provided. If this
parameter is set, the endpoint implementation will only return metrics
for the local node, avoiding a loop.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
[WB: remove unused $start_time leftover from benchmarks]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>

2024-08-14 14:18:48 +02:00

Lukas Wagner

5732ad6584

pvestatd: store subsystem status data in a shared cache

This commit adds a new module PVE::PullMetric. This module allows
us to store the status data of various subsystems, including status
data for the most recent pvestatd update loops. Right now, we
store 6 old generations - including the most recent values, that gives
70 seconds of stat history (based on a 10 second pvestatd update loop
interval).

This cache allows us to add support for pull-style metric collection
systems, be it Prometheus/OpenMetrics or some custom, JSON based
metric format.

This patch raises the required lib{proxmox,pve}-perl-rs version
requirements, since we need the new bindings for proxmox-shared-cache.

Signed-off-by: Lukas Wagner <l.wagner@proxmox.com>
[WB: actually bump *runtime* deps in d/control]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>

2024-08-14 14:18:34 +02:00

3 Commits