Improve Swarm support (#333)

* Query for labeled services as well

* Try scaling down services

* Scale services back up

* Use progress tool from Docker CLI

* In test, label both services

* Clean up error and log messages

* Document scale-up/down approach in docs

* Downgrade Docker CLI to match client

* Document services stats

* Do not rely on PreviousSpec for storing desired replica count

* Log warnings from Docker when updating services

* Check whether container and service labels collide

* Document script behavior on label collision

* Add additional check if all containers have been removed

* Scale services concurrently

* Move docker interaction code into own file

* Factor out code for service updating

* Time out after five minutes of not reaching desired container count

* Inline handling of in-swarm container level restart

* Timer is more suitable for timeout race

* Timeout when scaling down services should be configurable

* Choose better filename

* Reflect changes in naming

* Rename and deprecate BACKUP_STOP_CONTAINER_LABEL

* Improve logging

* Further simplify logging
This commit is contained in:
Frederik Ring
2024-01-31 12:17:41 +01:00
committed by GitHub
parent 2065fb2815
commit c3daeacecb
18 changed files with 640 additions and 145 deletions

View File

@@ -0,0 +1,19 @@
---
title: Replace deprecated BACKUP_STOP_CONTAINER_LABEL setting
layout: default
parent: How Tos
nav_order: 19
---
# Replace deprecated `BACKUP_STOP_CONTAINER_LABEL` setting
Version `v2.36.0` deprecated the `BACKUP_STOP_CONTAINER_LABEL` setting and renamed it `BACKUP_STOP_DURING_BACKUP_LABEL` which is supposed to signal that this will stop both containers _and_ services.
Migrating is done by renaming the key for your custom value:
```diff
env:
- BACKUP_STOP_CONTAINER_LABEL: database
+ BACKUP_STOP_DURING_BACKUP_LABEL: database
```
The old key will stay supported until the next major version, but logs a warning each time a backup is taken.

View File

@@ -76,7 +76,7 @@ Configuration, data about the backup run and helper functions will be passed to
Here is a list of all data passed to the template:
* `Config`: this object holds the configuration that has been passed to the script. The field names are the name of the recognized environment variables converted in PascalCase. (e.g. `BACKUP_STOP_CONTAINER_LABEL` becomes `BackupStopContainerLabel`)
* `Config`: this object holds the configuration that has been passed to the script. The field names are the name of the recognized environment variables converted in PascalCase. (e.g. `BACKUP_STOP_DURING_BACKUP_LABEL` becomes `BackupStopDuringBackupLabel`)
* `Error`: the error that made the backup fail. Only available in the `title_failure` and `body_failure` templates
* `Stats`: objects that holds stats regarding script execution. In case of an unsuccessful run, some information may not be available.
* `StartTime`: time when the script started execution
@@ -89,6 +89,11 @@ Here is a list of all data passed to the template:
* `ToStop`: number of containers matched by the stop rule
* `Stopped`: number of containers successfully stopped
* `StopErrors`: number of containers that were unable to be stopped (equal to `ToStop - Stopped`)
* `Services`: object containing stats about the docker services (only populated when Docker is running in Swarm mode)
* `All`: total number of services
* `ToScaleDown`: number of containers matched by the scale down rule
* `ScaledDwon`: number of containers successfully scaled down
* `ScaleDownErrors`: number of containers that were unable to be stopped (equal to `ToScaleDown - ScaledDowm`)
* `BackupFile`: object containing information about the backup file
* `Name`: name of the backup file (e.g. `backup-2022-02-11T01-00-00.tar.gz`)
* `FullPath`: full path of the backup file (e.g. `/archive/backup-2022-02-11T01-00-00.tar.gz`)

View File

@@ -7,11 +7,14 @@ nav_order: 1
# Stop containers during backup
{: .note }
In case you are running Docker in Swarm mode, [dedicated documentation](./use-with-docker-swarm.html) on service and container restart applies.
In many cases, it will be desirable to stop the services that are consuming the volume you want to backup in order to ensure data integrity.
This image can automatically stop and restart containers and services.
By default, any container that is labeled `docker-volume-backup.stop-during-backup=true` will be stopped before the backup is being taken and restarted once it has finished.
In case you need more fine grained control about which containers should be stopped (e.g. when backing up multiple volumes on different schedules), you can set the `BACKUP_STOP_CONTAINER_LABEL` environment variable and then use the same value for labeling:
In case you need more fine grained control about which containers should be stopped (e.g. when backing up multiple volumes on different schedules), you can set the `BACKUP_STOP_DURING_BACKUP_LABEL` environment variable and then use the same value for labeling:
```yml
version: '3'
@@ -25,7 +28,7 @@ services:
backup:
image: offen/docker-volume-backup:v2
environment:
BACKUP_STOP_CONTAINER_LABEL: service1
BACKUP_STOP_DURING_BACKUP_LABEL: service1
volumes:
- data:/backup/my-app-backup:ro
- /var/run/docker.sock:/var/run/docker.sock:ro

View File

@@ -7,12 +7,66 @@ nav_order: 13
# Use with Docker Swarm
By default, Docker Swarm will restart stopped containers automatically, even when manually stopped.
If you plan to have your containers / services stopped during backup, this means you need to apply the `on-failure` restart policy to your service's definitions.
A restart policy of `always` is not compatible with this tool.
{: .note }
The mechanisms described in this page __do only apply when Docker is running in [Swarm mode][swarm]__.
[swarm]: https://docs.docker.com/engine/swarm/
## Stopping containers during backup
Stopping and restarting containers during backup creation when running Docker in Swarm mode is supported in two ways.
{: .important }
Make sure you label your services and containers using only one of the describe approaches.
In case the script encounters a container that is labeled and has a parent service that is also labeled, it will exit early.
### Scaling services down to zero before scaling back up
When labeling a service in the `deploy` section, the following strategy for stopping and restarting will be used:
- The service is scaled down to zero replicas
- The backup is created
- The service is scaled back up to the previous number of replicas
{: .note }
This approach will only work for services that are deployed in __replicated mode__.
Such a service definition could look like:
```yml
services:
app:
image: myorg/myimage:latest
deploy:
labels:
- docker-volume-backup.stop-during-backup=true
replicas: 2
```
### Stopping the containers
This approach bypasses the services and stops containers directly, creates the backup and restarts the containers again.
As Docker Swarm would usually try to instantly restart containers that are manually stopped, this approach only works when using the `on-failure` restart policy.
A restart policy of `always` is not compatible with this approach.
Such a service definition could look like:
```yml
services:
app:
image: myapp/myimage:latest
labels:
- docker-volume-backup.stop-during-backup=true
deploy:
replicas: 2
restart_policy:
condition: on-failure
```
---
## Memory limit considerations
When running in Swarm mode, it's also advised to set a hard memory limit on your service (~25MB should be enough in most cases, but if you backup large files above half a gigabyte or similar, you might have to raise this in case the backup exits with `Killed`):
```yml

View File

@@ -352,7 +352,7 @@ services:
AWS_ACCESS_KEY_ID: AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Label the container using the `data_1` volume as `docker-volume-backup.stop-during-backup=service1`
BACKUP_STOP_CONTAINER_LABEL: service1
BACKUP_STOP_DURING_BACKUP_LABEL: service1
volumes:
- data_1:/backup/data-1-backup:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
@@ -362,7 +362,7 @@ services:
<<: *backup_environment
# Label the container using the `data_2` volume as `docker-volume-backup.stop-during-backup=service2`
BACKUP_CRON_EXPRESSION: "0 3 * * *"
BACKUP_STOP_CONTAINER_LABEL: service2
BACKUP_STOP_DURING_BACKUP_LABEL: service2
volumes:
- data_2:/backup/data-2-backup:ro
- /var/run/docker.sock:/var/run/docker.sock:ro

View File

@@ -316,15 +316,22 @@ You can populate below template according to your requirements and use it as you
# GPG_PASSPHRASE="<xxx>"
########### STOPPING CONTAINERS DURING BACKUP
########### STOPPING CONTAINERS AND SERVICES DURING BACKUP
# Containers can be stopped by applying a
# `docker-volume-backup.stop-during-backup` label. By default, all containers
# that are labeled with `true` will be stopped. If you need more fine grained
# control (e.g. when running multiple containers based on this image), you can
# override this default by specifying a different value here.
# Containers or services can be stopped by applying a
# `docker-volume-backup.stop-during-backup` label. By default, all containers and
# services that are labeled with `true` will be stopped. If you need more fine
# grained control (e.g. when running multiple containers based on this image),
# you can override this default by specifying a different value here.
# BACKUP_STOP_DURING_BACKUP_LABEL="service1"
# BACKUP_STOP_CONTAINER_LABEL="service1"
# When trying to scale down Docker Swarm services, give up after
# the specified amount of time in case the service has not converged yet.
# In case you need to adjust this timeout, supply a duration
# value as per https://pkg.go.dev/time#ParseDuration to `BACKUP_STOP_SERVICE_TIMEOUT`.
# Defaults to 5 minutes.
# BACKUP_STOP_SERVICE_TIMEOUT="5m"
########### EXECUTING COMMANDS IN CONTAINERS PRE/POST BACKUP