Skip to content

Instantly share code, notes, and snippets.

@free-pmx
Last active April 11, 2025 13:22

How to disable HA for maintenance

TL;DR Avoid unexpected non-suspect node reboot during maintenance in any High Availability cluster. No need to wait for any grace periods until it becomes inactive by itself, no uncertainties.


ORIGINAL POST How to disable HA for maintenance


If you are going to perform any kind of maintenance works which could disrupt your quorum cluster-wide (e.g. network equipment, small clusters), you would have learnt this risks seemingly random reboots on cluster nodes with (not only) active HA services. ^

TIP The rationale for this snippet is covered in a separate post on High Availability related watchdog that Proxmox employ on every single node at all times.

To safely disable HA without additional waiting times and avoiding HA stack bugs, you will want to perform the following:

Before the works

Once (on any node):

mv /etc/pve/ha/{resources.cfg,resources.cfg.bak}

Then on every node:

systemctl stop pve-ha-crm pve-ha-lrm
# check all went well
systemctl is-active pve-ha-crm pve-ha-lrm
# confirm you are ok to proceed without risking a reboot
test -d /run/watchdog-mux.active/ && echo nook || echo ok

After you are done

Reverse the above, so on every node:

systemctl start pve-ha-crm pve-ha-lrm

And then once all nodes are ready, reactivate the HA:

mv /etc/pve/ha/{resources.cfg.bak,resources.cfg}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment