Skip to main content

ADR-004: Containerization Approach

Status

Accepted

Context

We need to establish a containerization strategy for the GISE methodology platform that supports local development, testing, and production deployment across multiple environments while maintaining consistency, security, and operational efficiency.

Current Requirements

Development Environment:

  • Consistent development setup across team members
  • Quick onboarding for new developers
  • Isolated development dependencies
  • Hot reloading and debugging support

Testing Environment:

  • Reproducible test environments
  • Integration testing with external services
  • Performance testing capabilities
  • Automated CI/CD pipeline integration

Production Environment:

  • High availability and scalability
  • Security and compliance requirements
  • Resource optimization and cost efficiency
  • Multi-environment deployment (staging, production)

Technical Constraints:

  • Mixed operating systems in development (macOS, Linux, Windows)
  • Kubernetes deployment target for production
  • Resource limitations in development environments
  • Need for rapid iteration and deployment

Decision

We will adopt Docker as our primary containerization technology with Docker Compose for local development and Kubernetes for production orchestration.

Architecture Overview

Implementation Strategy

1. Container Images

Base Image Strategy:

# Use official Node.js LTS image with Alpine Linux for security and size
FROM node:18-alpine AS base

# Install security updates
RUN apk update && apk upgrade

# Create non-root user for security
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001

WORKDIR /app

# Install dependencies in separate layer for better caching
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# Copy application code
COPY --chown=nextjs:nodejs . .

# Switch to non-root user
USER nextjs

EXPOSE 3000

CMD ["npm", "start"]

Multi-stage Build for Production:

# Multi-stage Dockerfile for API service
FROM node:18-alpine AS dependencies
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

FROM node:18-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
RUN npm test
RUN npm run lint

FROM node:18-alpine AS production
RUN apk update && apk upgrade
RUN addgroup -g 1001 -S nodejs
RUN adduser -S apiuser -u 1001

WORKDIR /app

# Copy production dependencies
COPY --from=dependencies /app/node_modules ./node_modules

# Copy built application
COPY --from=build --chown=apiuser:nodejs /app/dist ./dist
COPY --from=build --chown=apiuser:nodejs /app/package*.json ./

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:3000/health || exit 1

USER apiuser
EXPOSE 3000

CMD ["node", "dist/server.js"]

2. Local Development with Docker Compose

Development Configuration:

# docker-compose.dev.yml
version: '3.8'

services:
# Web Frontend
web:
build:
context: ./frontend
dockerfile: Dockerfile.dev
target: development
ports:
- "3000:3000"
volumes:
- ./frontend:/app
- /app/node_modules
- /app/.next
environment:
- NODE_ENV=development
- NEXT_PUBLIC_API_URL=http://localhost:8000
- WATCHPACK_POLLING=true # For file watching in containers
depends_on:
- api
networks:
- gise-network

# API Backend
api:
build:
context: ./backend
dockerfile: Dockerfile.dev
target: development
ports:
- "8000:8000"
- "9229:9229" # Node.js debugging port
volumes:
- ./backend:/app
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://gise_user:gise_pass@db:5432/gise_dev
- REDIS_URL=redis://redis:6379
- JWT_SECRET=development-secret-change-in-production
depends_on:
db:
condition: service_healthy
redis:
condition: service_healthy
networks:
- gise-network
command: npm run dev:debug # Enable debugging in development

# PostgreSQL Database
db:
image: postgres:15-alpine
ports:
- "5432:5432"
environment:
- POSTGRES_DB=gise_dev
- POSTGRES_USER=gise_user
- POSTGRES_PASSWORD=gise_pass
volumes:
- postgres_dev_data:/var/lib/postgresql/data
- ./database/init:/docker-entrypoint-initdb.d
- ./database/seed:/docker-entrypoint-initdb.d/seed
healthcheck:
test: ["CMD-SHELL", "pg_isready -U gise_user -d gise_dev"]
interval: 10s
timeout: 5s
retries: 5
networks:
- gise-network

# Redis Cache
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_dev_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3
networks:
- gise-network

# Development Tools
mailhog:
image: mailhog/mailhog
ports:
- "1025:1025" # SMTP
- "8025:8025" # Web UI
networks:
- gise-network

# Database Admin Tool
pgadmin:
image: dpage/pgadmin4
ports:
- "5050:80"
environment:
- PGADMIN_DEFAULT_EMAIL=admin@gise.dev
- PGADMIN_DEFAULT_PASSWORD=admin
depends_on:
- db
networks:
- gise-network

volumes:
postgres_dev_data:
redis_dev_data:

networks:
gise-network:
driver: bridge

Development Dockerfile:

# Dockerfile.dev for backend
FROM node:18-alpine AS development

# Install development tools
RUN apk add --no-cache curl git

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install all dependencies (including dev dependencies)
RUN npm ci

# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S apiuser -u 1001

# Change ownership of app directory
RUN chown -R apiuser:nodejs /app

USER apiuser

# Expose application and debug ports
EXPOSE 8000 9229

# Development command with debugging
CMD ["npm", "run", "dev:debug"]

3. Production Containerization

Production Docker Compose:

# docker-compose.prod.yml
version: '3.8'

services:
# Nginx Reverse Proxy
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- web_static:/var/www/static:ro
depends_on:
- web
- api
restart: unless-stopped
networks:
- gise-network

# Production Web Application
web:
image: ${DOCKER_REGISTRY}/gise-web:${VERSION}
environment:
- NODE_ENV=production
- NEXT_PUBLIC_API_URL=https://api.gise.platform
volumes:
- web_static:/app/.next/static
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
networks:
- gise-network

# Production API Service
api:
image: ${DOCKER_REGISTRY}/gise-api:${VERSION}
environment:
- NODE_ENV=production
- DATABASE_URL=${DATABASE_URL}
- REDIS_URL=${REDIS_URL}
- JWT_SECRET=${JWT_SECRET}
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
memory: 1G
cpus: '1.0'
reservations:
memory: 512M
cpus: '0.5'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
networks:
- gise-network

# Production Database (External in real production)
db:
image: postgres:15-alpine
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_prod_data:/var/lib/postgresql/data
- ./database/backups:/backups
restart: unless-stopped
deploy:
resources:
limits:
memory: 2G
cpus: '1.0'
reservations:
memory: 1G
cpus: '0.5'
networks:
- gise-network

# Production Redis
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_prod_data:/data
restart: unless-stopped
networks:
- gise-network

volumes:
postgres_prod_data:
redis_prod_data:
web_static:

networks:
gise-network:
driver: bridge

4. Kubernetes Deployment

Kubernetes Manifests:

# k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: gise-platform
labels:
name: gise-platform
---
# Resource quotas for the namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: gise-resource-quota
namespace: gise-platform
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
persistentvolumeclaims: "4"
# k8s/api-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: gise-api
namespace: gise-platform
labels:
app: gise-api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: gise-api
version: v1
template:
metadata:
labels:
app: gise-api
version: v1
spec:
containers:
- name: api
image: ghcr.io/gise-platform/api:latest
ports:
- containerPort: 8000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: gise-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: gise-secrets
key: redis-url
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: gise-secrets
key: jwt-secret
resources:
limits:
memory: "1Gi"
cpu: "1000m"
requests:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1001
capabilities:
drop:
- ALL
securityContext:
fsGroup: 1001
---
apiVersion: v1
kind: Service
metadata:
name: gise-api-service
namespace: gise-platform
labels:
app: gise-api
spec:
selector:
app: gise-api
ports:
- name: http
port: 80
targetPort: 8000
protocol: TCP
type: ClusterIP
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: gise-ingress
namespace: gise-platform
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
tls:
- hosts:
- api.gise.platform
- app.gise.platform
secretName: gise-tls
rules:
- host: api.gise.platform
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gise-api-service
port:
number: 80
- host: app.gise.platform
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: gise-web-service
port:
number: 80

Security Considerations

Container Security

Image Security:

# Security-hardened Dockerfile
FROM node:18-alpine AS base

# Update packages and install security updates
RUN apk update && apk upgrade && apk add --no-cache dumb-init

# Create non-root user with specific UID/GID
RUN addgroup -g 1001 -S nodejs && \
adduser -S -u 1001 -G nodejs nodejs

# Set secure working directory
WORKDIR /app

# Copy and install dependencies as root
COPY package*.json ./
RUN npm ci --only=production && \
npm cache clean --force && \
rm -rf ~/.npm

# Copy application files
COPY --chown=nodejs:nodejs . .

# Remove write permissions from application code
RUN chmod -R 555 /app && \
chmod -R 755 /app/node_modules/.bin

# Switch to non-root user
USER nodejs

# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]

# Health check with timeout
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8000/health || exit 1

CMD ["node", "server.js"]

Security Scanning Integration:

# .github/workflows/security-scan.yml
name: Container Security Scan

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
security-scan:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Build Docker image
run: docker build -t gise-api:test ./backend

- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'gise-api:test'
format: 'sarif'
output: 'trivy-results.sarif'

- name: Upload Trivy scan results
uses: github/codeql-action/upload-sarif@v2
if: always()
with:
sarif_file: 'trivy-results.sarif'

- name: Run Snyk to check for vulnerabilities
uses: snyk/actions/docker@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
image: gise-api:test
args: --severity-threshold=high

Runtime Security

Pod Security Standards:

# k8s/pod-security-policy.yaml
apiVersion: v1
kind: Pod
metadata:
name: gise-api-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
runAsGroup: 1001
fsGroup: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: api
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp-volume
mountPath: /tmp
- name: var-tmp-volume
mountPath: /var/tmp
volumes:
- name: tmp-volume
emptyDir: {}
- name: var-tmp-volume
emptyDir: {}

CI/CD Integration

Build Pipeline

# .github/workflows/build-and-deploy.yml
name: Build and Deploy

on:
push:
branches: [main]
pull_request:
branches: [main]

env:
REGISTRY: ghcr.io
IMAGE_NAME: gise-platform

jobs:
build:
runs-on: ubuntu-latest

strategy:
matrix:
service: [api, web]

steps:
- uses: actions/checkout@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Log in to Container Registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-${{ matrix.service }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=sha,prefix=sha-
type=raw,value=latest,enable={{is_default_branch}}

- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: ./${{ matrix.service }}
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max

- name: Run security scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}-${{ matrix.service }}:sha-${{ github.sha }}
format: 'table'
exit-code: '1'
severity: 'CRITICAL,HIGH'

deploy:
if: github.ref == 'refs/heads/main'
needs: build
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Configure kubectl
uses: azure/k8s-set-context@v1
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}

- name: Deploy to Kubernetes
run: |
sed -i 's|IMAGE_TAG|sha-${{ github.sha }}|g' k8s/*.yaml
kubectl apply -f k8s/
kubectl rollout status deployment/gise-api -n gise-platform
kubectl rollout status deployment/gise-web -n gise-platform

Resource Management

Development Resource Limits

# docker-compose.override.yml (for resource-constrained environments)
version: '3.8'

services:
api:
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'

web:
deploy:
resources:
limits:
memory: 256M
cpus: '0.25'
reservations:
memory: 128M
cpus: '0.1'

db:
deploy:
resources:
limits:
memory: 512M
cpus: '0.5'
reservations:
memory: 256M
cpus: '0.25'
command: postgres -c max_connections=20 -c shared_buffers=64MB

Production Resource Optimization

# k8s/resource-management.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: gise-limit-range
namespace: gise-platform
spec:
limits:
- default:
memory: "1Gi"
cpu: "1000m"
defaultRequest:
memory: "512Mi"
cpu: "500m"
type: Container
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: gise-api-pdb
namespace: gise-platform
spec:
minAvailable: 2
selector:
matchLabels:
app: gise-api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gise-api-hpa
namespace: gise-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: gise-api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

Monitoring and Observability

Container Health Monitoring

// Health check endpoint implementation
import express from 'express';

const app = express();

interface HealthCheck {
status: 'healthy' | 'unhealthy';
timestamp: string;
uptime: number;
checks: {
database: 'healthy' | 'unhealthy';
redis: 'healthy' | 'unhealthy';
memory: 'healthy' | 'unhealthy';
disk: 'healthy' | 'unhealthy';
};
}

app.get('/health', async (req, res) => {
const health: HealthCheck = {
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
memory: checkMemory(),
disk: checkDisk()
}
};

const isUnhealthy = Object.values(health.checks).some(check => check === 'unhealthy');

if (isUnhealthy) {
health.status = 'unhealthy';
return res.status(503).json(health);
}

res.json(health);
});

app.get('/ready', async (req, res) => {
// Readiness check - can handle traffic
const ready = await checkReadiness();

if (ready) {
res.json({ status: 'ready' });
} else {
res.status(503).json({ status: 'not ready' });
}
});

Consequences

Positive Consequences

Development Benefits:

  • Consistent development environments across team members
  • Simplified onboarding process for new developers
  • Isolated development dependencies prevent conflicts
  • Easy local testing of full application stack

Operational Benefits:

  • Predictable deployment behavior across environments
  • Simplified rollback and recovery procedures
  • Resource isolation and security boundaries
  • Horizontal scaling capabilities with Kubernetes

Security Benefits:

  • Container isolation reduces attack surface
  • Immutable infrastructure principles
  • Automated security scanning in CI/CD pipeline
  • Non-root user execution by default

Negative Consequences

Complexity:

  • Additional learning curve for team members unfamiliar with containers
  • More complex local development setup initially
  • Debugging across container boundaries can be challenging
  • Container orchestration adds operational overhead

Resource Usage:

  • Additional memory and CPU overhead from containers
  • Storage space for multiple image layers
  • Network performance impact from container networking
  • Development machines need more resources

Development Workflow Changes:

  • Different debugging procedures for containerized applications
  • File watching and hot reloading require special configuration
  • Database migrations and seeding need container-aware scripts
  • Log aggregation becomes more complex

Implementation Timeline

Phase 1: Local Development Setup (Weeks 1-2)

  • Create development Dockerfiles for all services
  • Set up Docker Compose for local development
  • Configure hot reloading and debugging
  • Create developer documentation and onboarding guides
  • Test development workflow with team

Phase 2: CI/CD Integration (Weeks 3-4)

  • Set up Docker build pipeline in GitHub Actions
  • Implement security scanning with Trivy and Snyk
  • Configure container registry (GitHub Container Registry)
  • Create production-optimized Dockerfiles
  • Implement automated testing in containers

Phase 3: Production Deployment (Weeks 5-7)

  • Set up Kubernetes cluster configuration
  • Create Kubernetes manifests for all services
  • Implement health checks and monitoring
  • Configure ingress and TLS certificates
  • Set up resource limits and autoscaling

Phase 4: Optimization and Monitoring (Weeks 8-9)

  • Implement comprehensive monitoring and logging
  • Optimize image sizes and build times
  • Set up alerting for container health issues
  • Performance testing and resource tuning
  • Documentation and runbook creation

Success Metrics

Development Metrics

  • Developer onboarding time reduced to <30 minutes
  • Zero "works on my machine" issues
  • <5 minutes to start full development environment
  • 90% developer satisfaction with containerized workflow

Operational Metrics

  • Container startup time <30 seconds
  • Image pull time <2 minutes
  • Zero downtime deployments achieved
  • <1% container failure rate in production

Security Metrics

  • Zero critical vulnerabilities in production containers
  • 100% of containers run as non-root users
  • All images scanned and approved before deployment
  • Container security policies enforced

Resource Metrics

  • Memory usage optimized within 10% of non-containerized baseline
  • CPU overhead <5% compared to bare metal
  • Storage usage <2x compared to traditional deployments
  • Network latency impact <10ms

Future Considerations

Planned Enhancements

  • Service Mesh: Implement Istio for advanced traffic management
  • GitOps: Adopt ArgoCD for declarative deployment management
  • Multi-arch Builds: Support ARM64 for better performance and cost
  • Admission Controllers: Implement OPA/Gatekeeper for policy enforcement

Technology Evolution

  • Monitor Docker alternatives (Podman, containerd)
  • Evaluate serverless container platforms (AWS Fargate, Google Cloud Run)
  • Consider WebAssembly for lightweight containerization
  • Explore container-native development tools

Decision Date: December 19, 2024
Participants: DevOps Team, Development Teams, Security Team
Next Review: March 19, 2025