Major performance and reliability improvements for large database backups
- Replace Docker API with direct shell execution using Open3.popen3
- Implement 4KB chunk streaming to minimize memory usage
- Add aggressive garbage collection every 10MB
- Eliminate stderr accumulation in memory
- Reduce memory usage from 1.9GB to <100MB for large backups
- Add PostgreSQL timeout protection via BT_PG_LOCK_TIMEOUT (default: 60s)
- Add PostgreSQL statement timeout via BT_PG_STATEMENT_TIMEOUT (default: 0)
- Add backup process timeout via BT_BACKUP_TIMEOUT (default: 30min)
- Add stall detection (5 minutes no output)
- Add --lock-wait-timeout parameter for pg_dump commands
- Capture actual exit codes from backup commands
- Implement comprehensive timeout and stall detection
- Add progress logging every 100MB
- Improve error reporting with detailed debugging
- Add complete development environment with sample databases
- Add dry-run mode for configuration validation
- Add development setup scripts and docker-compose.dev.yml
- Add DEVELOPMENT.md and TROUBLESHOOTING.md guides
- Add Docker CLI to container for shell execution
- Update all PostgreSQL tests for new timeout parameters
- Maintain backward compatibility with existing configurations
- Add comprehensive test coverage (629 unit tests, 0 failures)
Fixes #high-memory-usage-large-databases
Fixes Failing integration tests
|
||
|---|---|---|
| .forgejo | ||
| .github/workflows | ||
| app | ||
| dev | ||
| .dockerignore | ||
| .gitignore | ||
| .tool-versions | ||
| API_DOCUMENTATION.md | ||
| CLAUDE.md | ||
| DEVELOPMENT.md | ||
| docker-compose.dev.yml | ||
| docker-compose.yml | ||
| Dockerfile | ||
| entrypoint.sh | ||
| LICENSE | ||
| README.md | ||
| SECURITY.md | ||
| TODO.md | ||
| TROUBLESHOOTING.md | ||
baktainer
Easily backup databases running in docker containers.
Features
- Database Support: MySQL, PostgreSQL, MongoDB, and SQLite databases
- Scheduled Backups: Run on a schedule using cron expressions
- Container Discovery: Define which databases to backup using docker labels
- Health Monitoring: Web dashboard and REST API for monitoring backup status
- Notifications: Multi-channel notifications (Slack, Discord, Teams, Email, Webhooks)
- Backup Rotation: Automatic cleanup based on age, count, or disk space
- Encryption: AES-256-GCM encryption for backup files
- Compression: Gzip compression to reduce backup file sizes
- Performance Monitoring: Real-time metrics and performance alerts
- Label Validation: Schema-based validation with helpful error messages
- SSL/TLS Support: Secure Docker API connections
- High Performance: Multi-threaded backups with dynamic scaling
Installation
⚠️ Security Notice: Baktainer requires Docker socket access which grants significant privileges. Please review SECURITY.md for important security considerations and recommended mitigations.
services:
baktainer:
image: jamez001/baktainer:latest
container_name: baktainer
restart: unless-stopped
ports:
- "8080:8080" # Health check dashboard
volumes:
- ./backups:/backups
- /var/run/docker.sock:/var/run/docker.sock
environment:
- BT_CRON="0 0 * * *" # Backup every day at midnight
- BT_DOCKER_URL=unix:///var/run/docker.sock
- BT_THREADS=4
- BT_LOG_LEVEL=info
- BT_HEALTH_SERVER_ENABLED=true # Enable health check server
- BT_COMPRESS=true # Enable compression
- BT_ROTATION_ENABLED=true # Enable backup rotation
- BT_RETENTION_DAYS=30 # Keep backups for 30 days
# SSL Configuration (if using remote Docker API)
#- BT_SSL=true
#- BT_CA=/path/to/ca.pem
#- BT_CERT=/path/to/cert.pem
#- BT_KEY=/path/to/key.pem
# Notification Configuration
#- BT_NOTIFICATION_CHANNELS=slack,log
#- BT_SLACK_WEBHOOK_URL=https://hooks.slack.com/...
#- BT_NOTIFY_FAILURES=true
# Encryption Configuration
#- BT_ENCRYPTION_ENABLED=true
#- BT_ENCRYPTION_KEY=your-hex-key
For enhanced security, consider using a Docker socket proxy. See SECURITY.md for detailed security recommendations.
Environment Variables
Core Configuration
| Variable | Description | Default |
|---|---|---|
| BT_CRON | Cron expression for scheduling backups | 0 0 * * * |
| BT_THREADS | Number of threads to use for backups | 4 |
| BT_LOG_LEVEL | Log level (debug, info, warn, error) | info |
| BT_DOCKER_URL | Docker API URL | unix:///var/run/docker.sock |
| BT_BACKUP_DIR | Directory to store backups | /backups |
Backup Options
| Variable | Description | Default |
|---|---|---|
| BT_COMPRESS | Enable gzip compression for backups | true |
| BT_ROTATION_ENABLED | Enable automatic backup rotation | true |
| BT_RETENTION_DAYS | Days to keep backups (0 = unlimited) | 30 |
| BT_RETENTION_COUNT | Max backups per container (0 = unlimited) | 0 |
| BT_MIN_FREE_SPACE_GB | Minimum free space in GB | 10 |
| BT_BACKUP_TIMEOUT | Overall backup timeout in seconds | 1800 |
PostgreSQL-Specific Options
| Variable | Description | Default |
|---|---|---|
| BT_PG_LOCK_TIMEOUT | Lock wait timeout for pg_dump (e.g., '60s') | 60s |
| BT_PG_STATEMENT_TIMEOUT | Statement timeout for pg_dump (0 = disabled) | 0 |
| BT_PG_VERBOSE | Enable verbose output for debugging | false |
Health Monitoring
| Variable | Description | Default |
|---|---|---|
| BT_HEALTH_SERVER_ENABLED | Enable health check server | false |
| BT_HEALTH_PORT | Port for health check server | 8080 |
| BT_HEALTH_BIND | Bind address for health server | 0.0.0.0 |
SSL/TLS Configuration
| Variable | Description | Default |
|---|---|---|
| BT_SSL | Enable SSL for Docker connection | false |
| BT_CA | Path to CA certificate or certificate data | none |
| BT_CERT | Path to client certificate or certificate data | none |
| BT_KEY | Path to client key or key data | none |
Encryption
| Variable | Description | Default |
|---|---|---|
| BT_ENCRYPTION_ENABLED | Enable AES-256-GCM encryption | false |
| BT_ENCRYPTION_KEY | Encryption key (hex or base64) | none |
| BT_ENCRYPTION_KEY_FILE | Path to encryption key file | none |
| BT_ENCRYPTION_PASSPHRASE | Passphrase for key derivation | none |
Notifications
| Variable | Description | Default |
|---|---|---|
| BT_NOTIFICATION_CHANNELS | Comma-separated list of channels | log |
| BT_NOTIFY_SUCCESS | Notify on successful backups | false |
| BT_NOTIFY_FAILURES | Notify on backup failures | true |
| BT_NOTIFY_WARNINGS | Notify on warnings | true |
| BT_NOTIFY_HEALTH | Notify on health issues | true |
| BT_SLACK_WEBHOOK_URL | Slack webhook URL | none |
| BT_DISCORD_WEBHOOK_URL | Discord webhook URL | none |
| BT_TEAMS_WEBHOOK_URL | Microsoft Teams webhook URL | none |
| BT_WEBHOOK_URL | Generic webhook URL | none |
Usage
Basic Backup Operation
Add labels to your docker containers to specify which databases to backup.
services:
db:
image: postgres:17
container_name: my-db
restart: unless-stopped
volumes:
- db:/var/lib/postgresql/data
environment:
POSTGRES_DB: "${DB_BASE:-database}"
POSTGRES_USER: "${DB_USER:-user}"
POSTGRES_PASSWORD: "${DB_PASSWORD:-StrongPassword}"
labels:
- baktainer.backup=true
- baktainer.db.engine=postgres
- baktainer.db.name=my-db
- baktainer.db.user=user
- baktainer.db.password=StrongPassword
- baktainer.name="MyApp"
Dry-Run Mode (Configuration Validation)
Before running actual backups, use dry-run mode to validate your configuration:
# Run dry-run mode to validate configuration
docker exec baktainer ruby app.rb --dry-run
# Or when starting a new container
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
jamez001/baktainer:latest \
ruby app.rb --dry-run
Dry-run mode will:
- Discover all containers with backup labels
- Validate label configuration for each container
- Check if containers are running
- Verify backup command availability (mysqldump, pg_dump, etc.)
- Simulate backup paths without creating files
- Report any configuration issues found
Example output:
============================================================
DRY-RUN MODE REPORT
============================================================
Discovered 3 containers for backup:
✓ mysql-prod: mysql backup
→ Would backup 'production_db' to: /backups/2025-01-12/mysql-prod-1736652000.sql
✓ postgres-app: postgres backup
→ Would backup 'app_db' to: /backups/2025-01-12/postgres-app-1736652000.sql
✗ redis-cache: redis backup
→ Issues found:
• Missing database user (baktainer.db.user label)
• Backup command 'redis-cli' not found in container
------------------------------------------------------------
Configuration validation: 2 passed, 1 failed
============================================================
Command-Line Options
# Show help
docker exec baktainer ruby app.rb --help
# Run backup immediately (bypass cron schedule)
docker exec baktainer ruby app.rb --now
# Validate configuration without performing backups
docker exec baktainer ruby app.rb --dry-run
Troubleshooting
Matrix/Synapse Database Backups Hanging
Large Matrix/Synapse databases may hang during backup due to:
- Long-running transactions blocking pg_dump
- Large database size exceeding timeout limits
- Active connections preventing table locks
Solutions:
- Increase the backup timeout:
BT_BACKUP_TIMEOUT=3600(1 hour) - Enable verbose logging:
BT_PG_VERBOSE=trueandBT_LOG_LEVEL=debug - Adjust lock timeout:
BT_PG_LOCK_TIMEOUT=120s(wait longer for locks) - Run backup during low-activity periods
- Monitor progress in logs - backups log every 100MB written
General Backup Issues
- Check container logs:
docker logs baktainer - Verify Docker socket permissions
- Test with dry-run:
docker exec baktainer ruby app.rb --dry-run - Check disk space in backup directory
- Ensure database credentials are correct
Docker Labels for Container Configuration
Required Labels
| Label | Description | Required For |
|---|---|---|
| baktainer.backup | Set to true to enable backup for this container |
All |
| baktainer.db.engine | Database engine: mysql, postgres, mongodb, sqlite |
All |
| baktainer.db.name | Name of the database to backup | All |
| baktainer.db.user | Username for the database | MySQL, PostgreSQL, MongoDB |
| baktainer.db.password | Password for the database | MySQL, PostgreSQL, MongoDB |
Optional Labels
| Label | Description | Default |
|---|---|---|
| baktainer.name | Application name for backup files | Database name |
| baktainer.compress | Enable compression for this container (true/false) |
BT_COMPRESS value |
| baktainer.encrypt | Enable encryption for this container (true/false) |
BT_ENCRYPTION_ENABLED value |
| baktainer.backup.priority | Backup priority: low, normal, high, critical |
normal |
| baktainer.backup.retention.days | Days to keep backups for this container | BT_RETENTION_DAYS value |
| baktainer.backup.retention.count | Max backups to keep for this container | BT_RETENTION_COUNT value |
Backup Files
The backup files will be stored in the directory specified by the BT_BACKUP_DIR environment variable. The files will be named according to the following format:
/backups/<date>/<db_name>-<timestamp>.sql.gz
Where <date> is the date of the backup ('YY-MM-DD' format) <db_name> is the name provided by baktainer.name, or the name of the database, <timestamp> is the unix timestamp of the backup.
By default, backups are compressed with gzip. To disable compression, set BT_COMPRESS=false or add baktainer.compress=false label to specific containers.
Health Monitoring & Dashboard
Baktainer includes a comprehensive health monitoring system with a web dashboard and REST API.
Accessing the Dashboard
When BT_HEALTH_SERVER_ENABLED=true, visit http://localhost:8080 for:
- Real-time backup status and metrics
- Container discovery and configuration
- Performance monitoring with auto-refresh
- System health checks and alerts
Health Check Endpoints
| Endpoint | Description |
|---|---|
GET / |
Interactive monitoring dashboard |
GET /health |
Health check (200 = healthy, 503 = unhealthy) |
GET /status |
Detailed system status and metrics |
GET /backups |
Backup history and statistics |
GET /containers |
Discovered containers with backup labels |
GET /config |
Configuration (credentials sanitized) |
GET /metrics |
Prometheus-format metrics |
Prometheus Integration
Use the /metrics endpoint to integrate with monitoring systems:
# Prometheus scrape config
scrape_configs:
- job_name: 'baktainer'
static_configs:
- targets: ['baktainer:8080']
metrics_path: '/metrics'
scrape_interval: 30s
Notifications
Configure multi-channel notifications for backup events:
Supported Channels
- Slack: Set
BT_SLACK_WEBHOOK_URL - Discord: Set
BT_DISCORD_WEBHOOK_URL - Microsoft Teams: Set
BT_TEAMS_WEBHOOK_URL - Generic Webhook: Set
BT_WEBHOOK_URL - Logs: Always available
Example Configuration
environment:
- BT_NOTIFICATION_CHANNELS=slack,log
- BT_SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
- BT_NOTIFY_FAILURES=true
- BT_NOTIFY_SUCCESS=false
- BT_NOTIFY_WARNINGS=true
Backup Encryption
Secure your backups with AES-256-GCM encryption:
environment:
- BT_ENCRYPTION_ENABLED=true
- BT_ENCRYPTION_KEY=your-256-bit-hex-key
# Or use a key file:
# - BT_ENCRYPTION_KEY_FILE=/path/to/keyfile
# Or derive from passphrase:
# - BT_ENCRYPTION_PASSPHRASE=your-secure-passphrase
Encrypted files use .enc extension and include authentication data for integrity verification.
Backup Rotation & Cleanup
Automatic backup cleanup keeps your storage manageable:
Configuration Options
environment:
- BT_ROTATION_ENABLED=true # Enable automatic cleanup
- BT_RETENTION_DAYS=30 # Keep backups for 30 days
- BT_RETENTION_COUNT=0 # Max backups per container (0 = unlimited)
- BT_MIN_FREE_SPACE_GB=10 # Trigger cleanup when free space < 10GB
Per-Container Overrides
labels:
- baktainer.backup.retention.days=7 # Keep this container's backups for 7 days
- baktainer.backup.retention.count=5 # Keep max 5 backups for this container
Cleanup Behavior
- Time-based: Remove backups older than specified days
- Count-based: Keep only N most recent backups per container
- Space-based: Automatic cleanup when disk space is low
- Smart cleanup: Removes empty date directories after cleanup
Testing
The project includes comprehensive test coverage with both unit and integration tests.
Running Tests
# Run all tests
cd app && bundle exec rspec
# Run tests with coverage report
cd app && COVERAGE=true bundle exec rspec
# Run only unit tests
cd app && bundle exec rspec spec/unit/
# Run only integration tests (requires Docker)
cd app && bundle exec rspec spec/integration/
Test Coverage
- Line Coverage: 94.94% (150/158 lines)
- Branch Coverage: 71.11% (32/45 branches)
- Tests cover all database engines, container discovery, error handling, and backup workflows
- Integration tests validate full backup operations with real Docker containers
Test Commands
# Quick unit tests
bin/test
# All tests with coverage
bin/test --all --coverage
# Integration tests with setup/cleanup
bin/test --integration --setup --cleanup
Development Roadmap
✅ Completed Features
- Database Support: MySQL, PostgreSQL, MongoDB, and SQLite backups
- Scheduling: Cron-based backup scheduling
- Container Discovery: Docker label-based configuration
- Docker Integration: Support for socket, TCP, SSL/TLS connections
- Compression: Gzip compression for backup files
- Health Monitoring: Web dashboard and REST API monitoring
- Notifications: Multi-channel notifications (Slack, Discord, Teams, Webhooks)
- Backup Rotation: Automatic cleanup based on age, count, and disk space
- Encryption: AES-256-GCM encryption for backup files
- Performance Monitoring: Real-time metrics and alerts
- Label Validation: Schema-based validation with helpful error messages
- High Performance: Multi-threaded backups with dynamic scaling
- Comprehensive Testing: 94.94% line coverage with unit and integration tests
Development Environment
For developers and contributors, Baktainer includes a complete development environment with sample databases:
Quick Setup
# Set up development environment with sample databases
./dev/setup.sh
# Test dry-run mode with sample data
docker compose -f docker-compose.dev.yml exec baktainer-dev ruby app.rb --dry-run
# View health dashboard
open http://localhost:8080
What's Included
- Sample Databases: MySQL, PostgreSQL, and SQLite with realistic test data
- Health Dashboard: Real-time monitoring and metrics at http://localhost:8080
- Automated Testing: Containers configured for testing various scenarios
- Debug Logging: Full visibility into backup processes
- Error Testing: Intentionally misconfigured containers for testing error handling
Development Services
| Database | Port | Credentials | Purpose |
|---|---|---|---|
| MySQL | 3306 | dev_user/dev_password |
Primary MySQL testing |
| PostgreSQL | 5432 | dev_user/dev_password |
PostgreSQL backup testing |
| SQLite | - | File-based | SQLite backup testing |
| MySQL Secondary | 3307 | secondary_user/secondary_pass |
Multi-container testing |
Quick Commands
# Test dry-run validation
docker compose -f docker-compose.dev.yml exec baktainer-dev ruby app.rb --dry-run
# Run immediate backup
docker compose -f docker-compose.dev.yml exec baktainer-dev ruby app.rb --now
# View backup files
ls -la dev-backups/
# Clean up environment
./dev/teardown.sh
For detailed development instructions, see DEVELOPMENT.md.
🔄 In Progress
- Backup streaming for large databases
- Advanced retry strategies with exponential backoff
📋 Future Enhancements
- Individual hooks for completed backups
- Hooks for fully completed backup cycles
- Configurable timeout limits for each backup
- Database-specific optimization settings
- Backup verification and integrity checking
- Multi-region backup replication
- Advanced alerting rules and thresholds