thing-1 and thing-2 are Dell PowerEdge R240 servers configured as a high-availability Pacemaker/Corosync cluster for hosting web services and applications.
/mnt/ha-shared/mnt/senuti (media library)All application resources are managed as systemd services wrapped by Pacemaker:
ha-web-containers - Docker Compose stack containing:
ha-mongodb - MongoDB instance (port 27018)
ha-navidrome - Music streaming service (port 4533)
ha-mailcow - Mail server stack with docker-compose.override.yml for clustering config
ha-wiki - Wiki.js documentation (port 3000, wiki.customstack.nyc)
By default, all resources prefer to run on thing-1. Resources automatically failover to thing-2 if thing-1 becomes unavailable.
Let's Encrypt certificates stored in /mnt/ha-shared/letsencrypt/ and shared across both nodes.
sudo pcs status
sudo pcs node standby thing-1
# or
sudo pcs node standby thing-2
sudo pcs node unstandby thing-1
# or
sudo pcs node unstandby thing-2
sudo pcs resource move <resource-name> thing-2
sudo pcs resource clear <resource-name>
sudo pcs resource restart <resource-name>
sudo pcs resource config <resource-name>
sudo pcs constraint location config
sudo pcs constraint colocation
cd /mnt/ha-shared/web-containers/accent-dev
git pull
docker build accent-dev
docker push jgmelon:5002/accent-dev:latest
sudo pcs resource restart ha-web-containers
# On development machine or thing-1
cd /mnt/ha-shared/web-containers
docker-compose build accent-dev
docker tag web-containers-accent-dev:latest jgmelon:5002/accent-dev:latest
docker push jgmelon:5002/accent-dev:latest
# On cluster
sudo pcs resource restart ha-web-containers
/mnt/ha-shared//mnt/ha-shared/web-containers//etc/systemd/system/systemctl daemon-reload after service file changes on both nodesIn case of total cluster failure:
/etc/systemd/system/ha-*.service/mnt/ha-shared/letsencrypt/