I had a Proxmox node fail on me recently. It was a Dell Optiplex 7010 with 32GB RAM and a 240GB SSD. The node was part of a Proxmox cluster with two other nodes. Not entirely sure what happened but it looks like the power supply failed, unfortunatley at the moment I can’t afford to replace either the power supply or the node itself. So I needed to remove the node from the cluster and reconfigure the cluster to run on just two nodes. This is a pretty simple process but I thought it would be worth documenting in case anyone else finds themselves in a similar situation.

Fixing the cluster

First check the status of the cluster:

root@pve1:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
3 1 pve3
4 1 pve1 (local)
  • I expected the dead node to be listed here as well. In my case it was not, but I could see that the node was still listed in the cluster configuration in the web gui.

  • In addition I expcected to see the number of quorum votes for the cluster to be 2, but it still said 3:

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate 

To fix this up I needed to run the following command on one of the other nodes in the cluster:

  • Remove the dead node from /etc/pve/corosync.conf on any node in the cluster.
  • systemctl restart pve-cluster
  • systemctl restart corosync
  • That removed the node from the cluster and pvecm status then reports the correct number of votes (2)
  • Then to remove the dead node from the web gui remove from /etc/pve/nodes/nodename - just remove the whole directory

References:

  • https://forum.proxmox.com/threads/node-failure.165179/#post-765580