Commit Graph

43 Commits

Author SHA1 Message Date
Dominik Csapak
ce251651a4 pvestatd: fix container cpuset scheduling
Since pve-container commit

c48a25452dccca37b3915e49b7618f6880aeafb1

the code to get the cpuset controller path lives in pve-commons PVE::CGroup.
Use that and improve the logging in case some error happens in the future.
Such an error will only be logged once per pvestatd run,
so it does not spam the log.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-12-03 16:33:50 +01:00
Wolfgang Bumiller
eacb5482e5 pvestatd: cgroupv2 support
This uses the newly introduced PVE::LXC::CGroup's
cpuset_controller_path() method to find the controller path,
so we need to depend on the newer pve-container package.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-04-04 20:19:02 +02:00
Dominik Csapak
0496138e44 ceph: factor out get/broadcast ceph versions to ceph::services
which also removes some dead code
(the my $local_last_version variable was never used)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-04-01 17:15:41 +02:00
Stefan Reiter
00b58c8c35 Broadcast supported CPU flags
pvestatd will check if the KVM version has changed using
kvm_user_version (which automatically clears its cache if QEMU/KVM
updates), and if it has, query supported CPU flags and broadcast them as
key-value pairs to the cluster.

If detection fails, we clear the kv-store and set up a delay (120s), to not
try again too quickly.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-01-14 11:59:48 +01:00
Alexandre Derumier
7405805780 pvestatd: fix require PVE::Network::SDN
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2019-11-26 17:08:23 +01:00
Stefan Reiter
e2509f4e37 Fix #2476: Fix auto-ballooning QMP command
Commit 0dd73a7fec (statd: refactor update_node_status) changed $target
in pvestatd's auto_balloning sub into a variable:

    my $target = int($res->{$vmid});

but then uses it in a string as a parameter to the $log function:

    $log->("BALLOON $vmid to $target (%d)\n", $target - $current);

This surprisingly causes the variable to be incorrectly converted into a
JSON string by perl's to_json (called in QMPClient after mon_cmd):

    {"value":"1234"}

instead of

    {"value":1234}

which causes QEMU to report the parameter as invalid:

    "Invalid parameter type for 'value', expected: integer"

This behaviour is made even trickier, since $target internally is still
considered more of an 'int' (although that's a weak claim in perl
anyway), showing up without quotes in Dumper et. al. - but the perldoc
for to_json scheds some light:

    simple scalars
        Simple Perl scalars (any scalar that is not a reference) are the
        most difficult objects to encode: this module will encode undefined
        scalars as JSON "null" values, scalars that have last been used in a
        string context before encoding as JSON strings, and anything else as
        number value

So coerce to_json to treat $target as an integer by using it as one and
everything is fine again.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-21 14:23:58 +01:00
Stefan Reiter
7a108020b3 refactor: vm_mon_cmd is now Monitor::mon_cmd
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 18:25:49 +01:00
Thomas Lamprecht
2112d31092 statd: increase RSS difference required for restart
it seems that we have a reference leak or the like somewhere in the
(graphite?) status plugin, while the recent transaction based update
mechanism made it slightly better, it's still bad with a lot of VMs..

Until we can track that down, or abandon perl for good, avoid to
frequent restarts by allowing statd to grow 15 MB of memory usage
after initial calibration (it's memory usage at the 10th cycle)

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-18 19:07:24 +01:00
Thomas Lamprecht
cc3d280b98 statd: report memory usage in KB
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-18 19:04:29 +01:00
Thomas Lamprecht
87be2c19e3 ext. metric: move to a transaction model
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-18 19:04:29 +01:00
Thomas Lamprecht
f1f4bfefc7 move common metric server management part to own module
For now it only handles the plugin registration and the two recently
integrated helpers.
But, this is a prepartation to move the external metrics server
update mechanic from a stateless always-newly-connect-send-disconnect
to a statefull transaction based mechanis; see later patches

keep the PVE::Status::Plugin use in pvestatd, as we read the cfs
hosted status.cfg there, and the parser is defined by the common
status plugin base module.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-16 16:19:42 +01:00
Thomas Lamprecht
1aaca6fde7 api: ceph/metadata: add structured node versions
include the version as string and as parts, as we do the split
already. Also include the build commit, so if we re-release a ceph
version, we can differ here too.

Use node as key, to make the new entry a bit more general, could be
easily expanded with other infos, if required.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-15 18:36:37 +01:00
Thomas Lamprecht
2a8e514947 statd: adapt ceph update error message
"getting ceph services" sound a bit vague, like the download of those
failed, or the like..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-15 11:33:47 +01:00
Thomas Lamprecht
a6dff455f6 statd: refactor out updating ceph metadata
makes no sense to do half in line and half in a extra update_method

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-15 11:30:14 +01:00
Thomas Lamprecht
5e82aaac89 status plugins: add update_all and foreach_plug helper
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-14 19:24:24 +01:00
Thomas Lamprecht
b25f645957 remove some useless empty lines
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-13 17:05:44 +01:00
Thomas Lamprecht
0dd73a7fec statd: refactor update_node_status
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-13 08:42:00 +01:00
Thomas Lamprecht
7887310045 statd: cleanup update_node_status
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-13 08:40:13 +01:00
Alexandre Derumier
a36565ba37 pvestatd : broadcast sdn transportzone status
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2019-09-03 10:28:55 +02:00
Dominik Csapak
4e76dbd7b3 ceph: refactor broadcast_ceph_services and get_cluster_service
and use the broadcast when a service is added/removed
we will use 'get_cluster_service' in the future when we generate a list
of services of a specific type

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-06-04 14:56:24 +02:00
Thomas Lamprecht
a78fd21f7f followup code cleanup for: broadcast ceph service data to cluster
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-05-27 15:52:12 +02:00
Dominik Csapak
fea391967a broadcast ceph service data to cluster
so that we have a list of all existing ceph services in the cluster

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-05-27 15:52:12 +02:00
Fabian Grünbichler
5ea29d1398 pvestatd: rotate auth keys if necessary
as a fallback to ensure rotation even if no logins happen on a given
cluster.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-03-18 12:23:53 +01:00
Fabian Grünbichler
0fcced161f use physical NIC regexp
because in >= Stretch, most systems don't have ethX devices any more.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2017-06-08 15:00:59 +02:00
Wolfgang Bumiller
127470f417 statd: rebalance: don't use CpuSet::max_cpuids
We're already limiting CPUs to lxc/cpuset.effective_cpus,
so let's use the highest cpuid from that set as a maximum to
initialize the container count array.
2017-04-20 12:18:55 +02:00
Dietmar Maurer
507869563a use 'U' to encode undefined values for RRD graphs
rrdtools 1.5 and newer seems to require this.
2017-03-17 11:27:18 +01:00
Thomas Lamprecht
09f19204be InfluxDB plugins: send nodename when updating CT/VM status
This allows filtering by node in InfluxDB queries, so the statistics
of all virtual guests on a specific nodes can be queried.

While for InfluxDB this is only a tag which does changes where the
data is stored, Graphite - our other status plugin - has no such
mechanics available. If we would add it to the object hierarchy,
e.g.: "qemu.$vmid.$nodename" a migration of a VM would result in two
different datasets.
So avoid breaking setups and omit it for Graphite for now.

Suggested-by: Daniel1108 <danielgallegosanchez@gmail.com>
CC: Daniel1108 <danielgallegosanchez@gmail.com>

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-02-28 11:28:10 +01:00
Dietmar Maurer
8a9bf7771e pvestatd.pm: corretly use new RPCEnvironment
Call $rpcenv->active_workers()
2017-01-18 17:28:59 +01:00
Dietmar Maurer
33dc998183 remove obsolete inline documentation 2017-01-11 10:54:47 +01:00
Dietmar Maurer
cbce367ddb rebalance_lxc_containers: make it work with old style lxc setups 2016-12-21 12:20:40 +01:00
Dietmar Maurer
b3f1adb200 rebalance_lxc_containers: avoid repeated warnings if rebalance fails
Only warn once.
2016-12-21 11:39:46 +01:00
Dietmar Maurer
0b959507c1 rebalance_lxc_containers: fix hotplug
factor out code to modify cpusets into $modify_cpuset->()
2016-12-21 11:13:16 +01:00
Dietmar Maurer
193146f8b0 rebalance_lxc_containers: make it work with new lxc/<ID>/ns subgroup 2016-12-21 11:04:33 +01:00
Dietmar Maurer
ccfff9204e use new CpuSet::max_cpuid() helper 2016-10-28 17:51:59 +02:00
Dietmar Maurer
8b750abc3e rebalance_lxc_containers: nicer logs, improve hotplug
We also need to handle the case when someone removes the 'cores'
setting from a container.
2016-10-28 07:09:08 +02:00
Dietmar Maurer
2499255bb9 rebalance_lxc_containers: improve algorithm
This one avoids unnecessary cpuset changes (for example
when a guest is stopped).
2016-10-27 12:08:11 +02:00
Dietmar Maurer
09fee7559b rebalance_lxc_containers: use cores instead of cpulimit 2016-10-27 09:08:38 +02:00
Dietmar Maurer
e0dc09ad0f rebalance_lxc_containers: do not use vmstatus, call from updata_status
Simply use PVE::LXC::config_list() and test if there is a cgroup.
2016-10-26 15:47:08 +02:00
Dietmar Maurer
07f9595f80 rebalance_lxc_containers: use persistent container ordering 2016-10-26 12:55:58 +02:00
Dietmar Maurer
41db757b13 pvestatd: add simple container cpuset balancing 2016-10-26 12:00:13 +02:00
Fabian Grünbichler
bbcfdc08cc use PVE::Storage::config(), not cfs_read_file() 2016-03-30 10:35:58 +02:00
Alexandre Derumier
58541b9463 add influxdb stats plugin V2
/etc/pve/status.cfg
-------------------
influxdb:
      server influxdb3.odiso.net
      port 8089

This require influxdb >= 0.9 with udp enabled

influxdb.conf
-------------

[[udp]]
  enabled = true
  bind-address = "0.0.0.0:8089"
  database = "proxmox"
  batch-size = 1000
  batch-timeout = "1s"

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2015-09-11 07:59:34 +02:00
Dietmar Maurer
efd04666df add missing file 2015-09-04 15:03:31 +02:00