ADR-004: Containerization Approach
Status
Accepted
Context
We need to establish a containerization strategy for the GISE methodology platform that supports local development, testing, and production deployment across multiple environments while maintaining consistency, security, and operational efficiency.
Current Requirements
Development Environment:
- Consistent development setup across team members
- Quick onboarding for new developers
- Isolated development dependencies
- Hot reloading and debugging support
Testing Environment:
- Reproducible test environments
- Integration testing with external services
- Performance testing capabilities
- Automated CI/CD pipeline integration
Production Environment:
- High availability and scalability
- Security and compliance requirements
- Resource optimization and cost efficiency
- Multi-environment deployment (staging, production)
Technical Constraints:
- Mixed operating systems in development (macOS, Linux, Windows)
- Kubernetes deployment target for production
- Resource limitations in development environments
- Need for rapid iteration and deployment
Decision
We will adopt Docker as our primary containerization technology with Docker Compose for local development and Kubernetes for production orchestration.
Architecture Overview
Implementation Strategy
1. Container Images
Base Image Strategy:
# Use official Node.js LTS image with Alpine Linux for security and size
FROM node:18-alpine AS base
# Install security updates
RUN apk update && apk upgrade
# Create non-root user for security
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001
WORKDIR /app
# Install dependencies in separate layer for better caching
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
# Copy application code
COPY --chown=nextjs:nodejs . .
# Switch to non-root user
USER nextjs
EXPOSE 3000
CMD ["npm", "start"]
Multi-stage Build for Production:
# Multi-stage Dockerfile for API service
FROM node:18-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
FROM node:18-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm test
RUN npm run lint
FROM node:18-alpine AS production
RUN apk update && apk upgrade
RUN addgroup -g 1001 -S nodejs
RUN adduser -S apiuser -u 1001
WORKDIR /app
# Copy production dependencies
COPY --from=dependencies /app/node_modules ./node_modules
# Copy built application
COPY --from=build --chown=apiuser:nodejs /app/dist ./dist
COPY --from=build --chown=apiuser:nodejs /app/package*.json ./
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1
USER apiuser
EXPOSE 3000
CMD ["node", "dist/server.js"]
2. Local Development with Docker Compose
Development Configuration:
# docker-compose.dev.yml
version: '3.8'
services:
# Web Frontend
web:
build:
context: ./frontend
dockerfile: Dockerfile.dev
target: development
ports:
- "3000:3000"
volumes:
- ./frontend:/app
- /app/node_modules
- /app/.next
environment:
- NODE_ENV=development
- NEXT_PUBLIC_API_URL=http://localhost:8000
- WATCHPACK_POLLING=true # For file watching in containers
depends_on:
- api
networks:
- gise-network
# API Backend
api:
build:
context: ./backend
dockerfile: Dockerfile.dev
target: development
ports:
- "8000:8000"
- "9229:9229" # Node.js debugging port
volumes:
- ./backend:/app
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://gise_user:gise_pass@db:5432/gise_dev
- REDIS_URL=redis://redis:6379
- JWT_SECRET=development-secret-change-in-production
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- gise-network
command: npm run dev:debug # Enable debugging in development
# PostgreSQL Database
db:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
- POSTGRES_DB=gise_dev
- POSTGRES_USER=gise_user
- POSTGRES_PASSWORD=gise_pass
volumes:
- postgres_dev_data:/var/lib/postgresql/data
- ./database/init:/docker-entrypoint-initdb.d
- ./database/seed:/docker-entrypoint-initdb.d/seed
healthcheck:
test: ["CMD-SHELL", "pg_isready -U gise_user -d gise_dev"]
interval: 10s
timeout: 5s
retries: 5
networks:
- gise-network
# Redis Cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_dev_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
networks:
- gise-network
# Development Tools
mailhog:
image: mailhog/mailhog
ports:
- "1025:1025" # SMTP
- "8025:8025" # Web UI
networks:
- gise-network
# Database Admin Tool
pgadmin:
image: dpage/pgadmin4
ports:
- "5050:80"
environment:
- PGADMIN_DEFAULT_EMAIL=admin@gise.dev
- PGADMIN_DEFAULT_PASSWORD=admin
depends_on:
- db
networks:
- gise-network
volumes:
postgres_dev_data:
redis_dev_data:
networks:
gise-network:
driver: bridge
Development Dockerfile:
# Dockerfile.dev for backend
FROM node:18-alpine AS development
# Install development tools
RUN apk add --no-cache curl git
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install all dependencies (including dev dependencies)
RUN npm ci
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S apiuser -u 1001
# Change ownership of app directory
RUN chown -R apiuser:nodejs /app
USER apiuser
# Expose application and debug ports
EXPOSE 8000 9229
# Development command with debugging
CMD ["npm", "run", "dev:debug"]
3. Production Containerization
Production Docker Compose:
# docker-compose.prod.yml
version: '3.8'
services:
# Nginx Reverse Proxy
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- web_static:/var/www/static:ro
depends_on:
- web
- api
restart: unless-stopped
networks:
- gise-network
# Production Web Application
web:
image: ${DOCKER_REGISTRY}/gise-web:${VERSION}
environment:
- NODE_ENV=production
- NEXT_PUBLIC_API_URL=https://api.gise.platform
volumes:
- web_static:/app/.next/static
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
networks:
- gise-network
# Production API Service
api:
image: ${DOCKER_REGISTRY}/gise-api:${VERSION}
environment:
- NODE_ENV=production
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
- JWT_SECRET=${JWT_SECRET}
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 512M
cpus: '0.5'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- gise-network
# Production Database (External in real production)
db:
image: postgres:15-alpine
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_prod_data:/var/lib/postgresql/data
- ./database/backups:/backups
restart: unless-stopped
deploy:
resources:
limits:
memory: 2G
cpus: '1.0'
reservations:
memory: 1G
cpus: '0.5'
networks:
- gise-network
# Production Redis
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_prod_data:/data
restart: unless-stopped
networks:
- gise-network
volumes:
postgres_prod_data:
redis_prod_data:
web_static:
networks:
gise-network:
driver: bridge
4. Kubernetes Deployment
Kubernetes Manifests:
# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: gise-platform
labels:
name: gise-platform
---
# Resource quotas for the namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: gise-resource-quota
namespace: gise-platform
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
persistentvolumeclaims: "4"
# k8s/api-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gise-api
namespace: gise-platform
labels:
app: gise-api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: gise-api
version: v1
template:
metadata:
labels:
app: gise-api
version: v1
spec:
containers:
- name: api
image: ghcr.io/gise-platform/api:latest
ports:
- containerPort: 8000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: gise-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: gise-secrets
key: redis-url
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: gise-secrets
key: jwt-secret
resources:
limits:
memory: "1Gi"
cpu: "1000m"
requests:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1001
capabilities:
drop:
- ALL
securityContext:
fsGroup: 1001
---
apiVersion: v1
kind: Service
metadata:
name: gise-api-service
namespace: gise-platform
labels:
app: gise-api
spec:
selector:
app: gise-api
ports:
- name: http
port: 80
targetPort: 8000
protocol: TCP
type: ClusterIP
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: gise-ingress
namespace: gise-platform
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
tls:
- hosts:
- api.gise.platform
- app.gise.platform
secretName: gise-tls
rules:
- host: api.gise.platform
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gise-api-service
port:
number: 80
- host: app.gise.platform
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gise-web-service
port:
number: 80
Security Considerations
Container Security
Image Security:
# Security-hardened Dockerfile
FROM node:18-alpine AS base
# Update packages and install security updates
RUN apk update && apk upgrade && apk add --no-cache dumb-init
# Create non-root user with specific UID/GID
RUN addgroup -g 1001 -S nodejs && \
adduser -S -u 1001 -G nodejs nodejs
# Set secure working directory
WORKDIR /app
# Copy and install dependencies as root
COPY package*.json ./
RUN npm ci --only=production && \
npm cache clean --force && \
rm -rf ~/.npm
# Copy application files
COPY --chown=nodejs:nodejs . .
# Remove write permissions from application code
RUN chmod -R 555 /app && \
chmod -R 755 /app/node_modules/.bin
# Switch to non-root user
USER nodejs
# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
# Health check with timeout
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1
CMD ["node", "server.js"]
Security Scanning Integration:
# .github/workflows/security-scan.yml
name: Container Security Scan
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build Docker image
run: docker build -t gise-api:test ./backend
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'gise-api:test'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
if: always()
with:
sarif_file: 'trivy-results.sarif'
- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/docker@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
image: gise-api:test
args: --severity-threshold=high
Runtime Security
Pod Security Standards:
# k8s/pod-security-policy.yaml
apiVersion: v1
kind: Pod
metadata:
name: gise-api-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: api
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp-volume
mountPath: /tmp
- name: var-tmp-volume
mountPath: /var/tmp
volumes:
- name: tmp-volume
emptyDir: {}
- name: var-tmp-volume
emptyDir: {}
CI/CD Integration
Build Pipeline
# .github/workflows/build-and-deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: gise-platform
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
service: [api, web]
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-${{ matrix.service }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix=sha-
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: ./${{ matrix.service }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Run security scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-${{ matrix.service }}:sha-${{ github.sha }}
format: 'table'
exit-code: '1'
severity: 'CRITICAL,HIGH'
deploy:
if: github.ref == 'refs/heads/main'
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy to Kubernetes
run: |
sed -i 's|IMAGE_TAG|sha-${{ github.sha }}|g' k8s/*.yaml
kubectl apply -f k8s/
kubectl rollout status deployment/gise-api -n gise-platform
kubectl rollout status deployment/gise-web -n gise-platform
Resource Management
Development Resource Limits
# docker-compose.override.yml (for resource-constrained environments)
version: '3.8'
services:
api:
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
web:
deploy:
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 128M
cpus: '0.1'
db:
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
command: postgres -c max_connections=20 -c shared_buffers=64MB
Production Resource Optimization
# k8s/resource-management.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: gise-limit-range
namespace: gise-platform
spec:
limits:
- default:
memory: "1Gi"
cpu: "1000m"
defaultRequest:
memory: "512Mi"
cpu: "500m"
type: Container
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: gise-api-pdb
namespace: gise-platform
spec:
minAvailable: 2
selector:
matchLabels:
app: gise-api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gise-api-hpa
namespace: gise-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gise-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring and Observability
Container Health Monitoring
// Health check endpoint implementation
import express from 'express';
const app = express();
interface HealthCheck {
status: 'healthy' | 'unhealthy';
timestamp: string;
uptime: number;
checks: {
database: 'healthy' | 'unhealthy';
redis: 'healthy' | 'unhealthy';
memory: 'healthy' | 'unhealthy';
disk: 'healthy' | 'unhealthy';
};
}
app.get('/health', async (req, res) => {
const health: HealthCheck = {
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
memory: checkMemory(),
disk: checkDisk()
}
};
const isUnhealthy = Object.values(health.checks).some(check => check === 'unhealthy');
if (isUnhealthy) {
health.status = 'unhealthy';
return res.status(503).json(health);
}
res.json(health);
});
app.get('/ready', async (req, res) => {
// Readiness check - can handle traffic
const ready = await checkReadiness();
if (ready) {
res.json({ status: 'ready' });
} else {
res.status(503).json({ status: 'not ready' });
}
});
Consequences
Positive Consequences
Development Benefits:
- Consistent development environments across team members
- Simplified onboarding process for new developers
- Isolated development dependencies prevent conflicts
- Easy local testing of full application stack
Operational Benefits:
- Predictable deployment behavior across environments
- Simplified rollback and recovery procedures
- Resource isolation and security boundaries
- Horizontal scaling capabilities with Kubernetes
Security Benefits:
- Container isolation reduces attack surface
- Immutable infrastructure principles
- Automated security scanning in CI/CD pipeline
- Non-root user execution by default
Negative Consequences
Complexity:
- Additional learning curve for team members unfamiliar with containers
- More complex local development setup initially
- Debugging across container boundaries can be challenging
- Container orchestration adds operational overhead
Resource Usage:
- Additional memory and CPU overhead from containers
- Storage space for multiple image layers
- Network performance impact from container networking
- Development machines need more resources
Development Workflow Changes:
- Different debugging procedures for containerized applications
- File watching and hot reloading require special configuration
- Database migrations and seeding need container-aware scripts
- Log aggregation becomes more complex
Implementation Timeline
Phase 1: Local Development Setup (Weeks 1-2)
- Create development Dockerfiles for all services
- Set up Docker Compose for local development
- Configure hot reloading and debugging
- Create developer documentation and onboarding guides
- Test development workflow with team
Phase 2: CI/CD Integration (Weeks 3-4)
- Set up Docker build pipeline in GitHub Actions
- Implement security scanning with Trivy and Snyk
- Configure container registry (GitHub Container Registry)
- Create production-optimized Dockerfiles
- Implement automated testing in containers
Phase 3: Production Deployment (Weeks 5-7)
- Set up Kubernetes cluster configuration
- Create Kubernetes manifests for all services
- Implement health checks and monitoring
- Configure ingress and TLS certificates
- Set up resource limits and autoscaling
Phase 4: Optimization and Monitoring (Weeks 8-9)
- Implement comprehensive monitoring and logging
- Optimize image sizes and build times
- Set up alerting for container health issues
- Performance testing and resource tuning
- Documentation and runbook creation
Success Metrics
Development Metrics
- Developer onboarding time reduced to <30 minutes
- Zero "works on my machine" issues
- <5 minutes to start full development environment
-
90% developer satisfaction with containerized workflow
Operational Metrics
- Container startup time <30 seconds
- Image pull time <2 minutes
- Zero downtime deployments achieved
- <1% container failure rate in production
Security Metrics
- Zero critical vulnerabilities in production containers
- 100% of containers run as non-root users
- All images scanned and approved before deployment
- Container security policies enforced
Resource Metrics
- Memory usage optimized within 10% of non-containerized baseline
- CPU overhead <5% compared to bare metal
- Storage usage <2x compared to traditional deployments
- Network latency impact <10ms
Future Considerations
Planned Enhancements
- Service Mesh: Implement Istio for advanced traffic management
- GitOps: Adopt ArgoCD for declarative deployment management
- Multi-arch Builds: Support ARM64 for better performance and cost
- Admission Controllers: Implement OPA/Gatekeeper for policy enforcement
Technology Evolution
- Monitor Docker alternatives (Podman, containerd)
- Evaluate serverless container platforms (AWS Fargate, Google Cloud Run)
- Consider WebAssembly for lightweight containerization
- Explore container-native development tools
Decision Date: December 19, 2024
Participants: DevOps Team, Development Teams, Security Team
Next Review: March 19, 2025