Database Design Patterns

Comprehensive guide to database design patterns and best practices for GISE methodology projects. Focus on relational databases with PostgreSQL examples, but principles apply to other SQL databases.

Design Principles

1. Normalization Guidelines

First Normal Form (1NF)

-- Bad: Repeating groups
CREATE TABLE users (
    id UUID PRIMARY KEY,
    name VARCHAR(100),
    email1 VARCHAR(255),
    email2 VARCHAR(255),
    email3 VARCHAR(255)
);

-- Good: Separate table for multiple values
CREATE TABLE users (
    id UUID PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE user_emails (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL REFERENCES users(id),
    email VARCHAR(255) NOT NULL,
    is_primary BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Second Normal Form (2NF)

-- Bad: Partial dependencies
CREATE TABLE order_items (
    order_id UUID,
    product_id UUID,
    product_name VARCHAR(255), -- Depends only on product_id
    product_price DECIMAL(10,2), -- Depends only on product_id
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

-- Good: Remove partial dependencies
CREATE TABLE order_items (
    order_id UUID,
    product_id UUID,
    quantity INTEGER,
    unit_price DECIMAL(10,2), -- Price at time of order
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Third Normal Form (3NF)

-- Bad: Transitive dependencies
CREATE TABLE employees (
    id UUID PRIMARY KEY,
    name VARCHAR(100),
    department_id UUID,
    department_name VARCHAR(100), -- Transitively dependent
    department_budget DECIMAL(12,2) -- Transitively dependent
);

-- Good: Separate department data
CREATE TABLE departments (
    id UUID PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    budget DECIMAL(12,2) NOT NULL
);

CREATE TABLE employees (
    id UUID PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    department_id UUID REFERENCES departments(id)
);

2. Primary Key Strategies

UUID Primary Keys (Recommended)

-- Advantages: Globally unique, no collision risk, better for distributed systems
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Auto-incrementing Integers

-- Use when: Single database, sequential IDs needed, performance critical
CREATE TABLE log_entries (
    id BIGSERIAL PRIMARY KEY,
    message TEXT NOT NULL,
    level VARCHAR(20) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Common Design Patterns

1. User Management Pattern

-- Core user table
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    is_active BOOLEAN DEFAULT TRUE,
    email_verified BOOLEAN DEFAULT FALSE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- User roles (many-to-many)
CREATE TABLE roles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(50) UNIQUE NOT NULL,
    description TEXT,
    permissions JSONB DEFAULT '[]'::jsonb
);

CREATE TABLE user_roles (
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    role_id UUID REFERENCES roles(id) ON DELETE CASCADE,
    granted_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    granted_by UUID REFERENCES users(id),
    PRIMARY KEY (user_id, role_id)
);

-- User sessions/tokens
CREATE TABLE user_sessions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    token_hash VARCHAR(255) NOT NULL,
    device_info JSONB,
    ip_address INET,
    expires_at TIMESTAMP WITH TIME ZONE NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

2. Hierarchical Data Pattern

-- Categories with unlimited nesting (Adjacency List)
CREATE TABLE categories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    parent_id UUID REFERENCES categories(id),
    name VARCHAR(100) NOT NULL,
    slug VARCHAR(100) NOT NULL,
    level INTEGER NOT NULL DEFAULT 0,
    path TEXT NOT NULL, -- "/electronics/computers/laptops"
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Ensure slug uniqueness within same parent
CREATE UNIQUE INDEX idx_categories_parent_slug 
ON categories(COALESCE(parent_id, '00000000-0000-0000-0000-000000000000'::uuid), slug);

-- Function to update path and level
CREATE OR REPLACE FUNCTION update_category_path() RETURNS TRIGGER AS $$
BEGIN
    IF NEW.parent_id IS NULL THEN
        NEW.level := 0;
        NEW.path := '/' || NEW.slug;
    ELSE
        SELECT level + 1, path || '/' || NEW.slug 
        INTO NEW.level, NEW.path
        FROM categories WHERE id = NEW.parent_id;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER category_path_trigger
    BEFORE INSERT OR UPDATE ON categories
    FOR EACH ROW EXECUTE FUNCTION update_category_path();

3. Audit Trail Pattern

-- Base table with audit fields
CREATE TABLE projects (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    status VARCHAR(50) NOT NULL DEFAULT 'active',
    owner_id UUID NOT NULL REFERENCES users(id),
    
    -- Audit fields
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    created_by UUID NOT NULL REFERENCES users(id),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_by UUID NOT NULL REFERENCES users(id),
    version INTEGER NOT NULL DEFAULT 1
);

-- Audit log for detailed change tracking
CREATE TABLE audit_logs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    table_name VARCHAR(50) NOT NULL,
    record_id UUID NOT NULL,
    action VARCHAR(10) NOT NULL, -- INSERT, UPDATE, DELETE
    old_values JSONB,
    new_values JSONB,
    changed_fields TEXT[],
    user_id UUID REFERENCES users(id),
    ip_address INET,
    user_agent TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Trigger function for automatic audit logging
CREATE OR REPLACE FUNCTION audit_trigger() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO audit_logs (
        table_name, record_id, action, old_values, new_values, user_id
    ) VALUES (
        TG_TABLE_NAME,
        CASE TG_OP
            WHEN 'DELETE' THEN OLD.id
            ELSE NEW.id
        END,
        TG_OP,
        CASE TG_OP 
            WHEN 'INSERT' THEN NULL
            ELSE to_jsonb(OLD)
        END,
        CASE TG_OP
            WHEN 'DELETE' THEN NULL
            ELSE to_jsonb(NEW)
        END,
        CASE TG_OP
            WHEN 'DELETE' THEN OLD.updated_by
            ELSE NEW.updated_by
        END
    );
    RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

4. Soft Delete Pattern

CREATE TABLE posts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title VARCHAR(255) NOT NULL,
    content TEXT,
    author_id UUID NOT NULL REFERENCES users(id),
    
    -- Soft delete fields
    deleted_at TIMESTAMP WITH TIME ZONE,
    deleted_by UUID REFERENCES users(id),
    
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- View for active records only
CREATE VIEW active_posts AS
SELECT * FROM posts WHERE deleted_at IS NULL;

-- Soft delete function
CREATE OR REPLACE FUNCTION soft_delete_post(post_id UUID, deleting_user UUID)
RETURNS BOOLEAN AS $$
BEGIN
    UPDATE posts 
    SET deleted_at = NOW(), deleted_by = deleting_user
    WHERE id = post_id AND deleted_at IS NULL;
    
    RETURN FOUND;
END;
$$ LANGUAGE plpgsql;

Performance Optimization

1. Strategic Indexing

-- Single column indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at);

-- Composite indexes (order matters!)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, created_at DESC);
CREATE INDEX idx_products_category_price ON products(category_id, price);

-- Partial indexes (with conditions)
CREATE INDEX idx_users_active_email ON users(email) WHERE is_active = TRUE;
CREATE INDEX idx_orders_pending ON orders(created_at) WHERE status = 'pending';

-- Expression indexes
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
CREATE INDEX idx_full_name ON users((first_name || ' ' || last_name));

-- JSON/JSONB indexes
CREATE INDEX idx_user_preferences ON users USING GIN (preferences);
CREATE INDEX idx_metadata_tags ON products USING GIN ((metadata -> 'tags'));

2. Query Optimization Patterns

-- Use EXPLAIN ANALYZE to understand query performance
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5;

-- Optimize with proper indexes
CREATE INDEX idx_users_created_at ON users(created_at);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Use window functions for complex analytics
SELECT 
    product_name,
    price,
    category_id,
    AVG(price) OVER (PARTITION BY category_id) as avg_category_price,
    RANK() OVER (PARTITION BY category_id ORDER BY price DESC) as price_rank
FROM products;

3. Partitioning for Large Tables

-- Time-based partitioning for log tables
CREATE TABLE event_logs (
    id BIGSERIAL,
    user_id UUID,
    event_type VARCHAR(50),
    event_data JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE event_logs_2024_01 PARTITION OF event_logs
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE event_logs_2024_02 PARTITION OF event_logs
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

-- Automatic partition creation function
CREATE OR REPLACE FUNCTION create_monthly_partition(table_name TEXT, start_date DATE)
RETURNS VOID AS $$
DECLARE
    partition_name TEXT;
    end_date DATE;
BEGIN
    end_date := start_date + INTERVAL '1 month';
    partition_name := table_name || '_' || TO_CHAR(start_date, 'YYYY_MM');
    
    EXECUTE FORMAT('CREATE TABLE %I PARTITION OF %I FOR VALUES FROM (%L) TO (%L)',
                   partition_name, table_name, start_date, end_date);
END;
$$ LANGUAGE plpgsql;

Data Types and Constraints

1. Appropriate Data Type Selection

CREATE TABLE products (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    
    -- Text fields with appropriate limits
    name VARCHAR(255) NOT NULL,
    description TEXT,
    sku VARCHAR(50) UNIQUE NOT NULL,
    
    -- Numeric types
    price DECIMAL(10,2) NOT NULL CHECK (price > 0),
    weight_kg DECIMAL(8,3) CHECK (weight_kg > 0),
    stock_quantity INTEGER NOT NULL DEFAULT 0 CHECK (stock_quantity >= 0),
    
    -- Enums for constrained values
    status VARCHAR(20) NOT NULL DEFAULT 'active' 
        CHECK (status IN ('active', 'inactive', 'discontinued')),
    
    -- JSON for flexible metadata
    metadata JSONB DEFAULT '{}'::jsonb,
    
    -- Arrays for tags/categories
    tags TEXT[] DEFAULT '{}',
    
    -- Timestamps with time zones
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

2. Custom Data Types

-- Create custom types for better data modeling
CREATE TYPE contact_type AS ENUM ('email', 'phone', 'address');
CREATE TYPE priority_level AS ENUM ('low', 'medium', 'high', 'critical');

-- Use in table definitions
CREATE TABLE contacts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    type contact_type NOT NULL,
    value TEXT NOT NULL,
    is_primary BOOLEAN DEFAULT FALSE
);

CREATE TABLE tasks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title VARCHAR(255) NOT NULL,
    priority priority_level NOT NULL DEFAULT 'medium',
    due_date DATE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Migration Strategies

1. Schema Versioning

-- Track schema versions
CREATE TABLE schema_migrations (
    version VARCHAR(50) PRIMARY KEY,
    applied_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    description TEXT
);

-- Example migration script: 001_create_users_table.sql
BEGIN;

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

INSERT INTO schema_migrations (version, description)
VALUES ('001', 'Create users table');

COMMIT;

2. Safe Schema Changes

-- Add columns with defaults (safe)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Add NOT NULL columns safely
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
UPDATE users SET status = 'active' WHERE status IS NULL;
ALTER TABLE users ALTER COLUMN status SET NOT NULL;

-- Rename columns (requires application coordination)
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN full_name VARCHAR(200);

-- Step 2: Update application to write to both columns
-- Step 3: Backfill data
UPDATE users SET full_name = first_name || ' ' || last_name 
WHERE full_name IS NULL;

-- Step 4: Update application to read from new column
-- Step 5: Drop old columns
ALTER TABLE users DROP COLUMN first_name, DROP COLUMN last_name;

Security Best Practices

1. Access Control

-- Create role-based access
CREATE ROLE app_read;
CREATE ROLE app_write;
CREATE ROLE app_admin;

-- Grant appropriate permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_read;
GRANT SELECT, INSERT, UPDATE ON ALL TABLES IN SCHEMA public TO app_write;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO app_admin;

-- Row-level security example
CREATE TABLE tenant_data (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL,
    data JSONB NOT NULL
);

-- Enable RLS
ALTER TABLE tenant_data ENABLE ROW LEVEL SECURITY;

-- Create policy
CREATE POLICY tenant_isolation ON tenant_data
FOR ALL TO app_user
USING (tenant_id = current_setting('app.current_tenant')::UUID);

2. Data Validation

-- Email validation
ALTER TABLE users ADD CONSTRAINT valid_email 
CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$');

-- Phone validation
ALTER TABLE users ADD CONSTRAINT valid_phone
CHECK (phone IS NULL OR phone ~ '^\+?[1-9]\d{1,14}$');

-- JSON schema validation
CREATE OR REPLACE FUNCTION validate_json_schema(data JSONB, schema JSONB)
RETURNS BOOLEAN AS $$
-- Implementation would use a JSON schema validator
$$ LANGUAGE plpgsql;

ALTER TABLE products ADD CONSTRAINT valid_metadata
CHECK (validate_json_schema(metadata, '{"type": "object", "properties": {...}}'::jsonb));

Integration with GISE Phases

Design Phase

Create complete ERD diagrams
Document all relationships and constraints
Plan for scalability and performance
Design migration strategy

Development Phase

Implement schema with proper indexing
Create database access layers
Add comprehensive constraints and validation
Implement audit logging

Testing Phase

Test with realistic data volumes
Validate performance with EXPLAIN ANALYZE
Test constraint enforcement
Verify backup and recovery procedures

Deploy Phase

Plan zero-downtime migrations
Monitor query performance
Set up automated backups
Implement monitoring and alerting

Common Anti-Patterns to Avoid

1. God Tables

-- Bad: Everything in one table
CREATE TABLE mega_table (
    id UUID PRIMARY KEY,
    user_email VARCHAR(255),
    user_name VARCHAR(255),
    order_date DATE,
    product_name VARCHAR(255),
    product_price DECIMAL(10,2),
    shipping_address TEXT,
    -- ... 50+ more columns
);

-- Good: Properly normalized structure
-- users, orders, order_items, products, addresses tables

2. Missing Foreign Key Constraints

-- Bad: No referential integrity
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID, -- No foreign key constraint!
    product_id UUID   -- No foreign key constraint!
);

-- Good: Proper relationships
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    customer_id UUID NOT NULL REFERENCES users(id),
    product_id UUID NOT NULL REFERENCES products(id)
);

3. Generic Column Names

-- Bad: Ambiguous column names
CREATE TABLE user_data (
    id UUID,
    data TEXT,    -- What kind of data?
    value TEXT,   -- What value?
    type INTEGER  -- Type of what?
);

-- Good: Descriptive column names
CREATE TABLE user_preferences (
    id UUID,
    preference_key VARCHAR(50),
    preference_value TEXT,
    data_type VARCHAR(20)
);

These patterns provide a solid foundation for scalable, maintainable database design within the GISE methodology framework.

Design Principles​

1. Normalization Guidelines​

First Normal Form (1NF)​

Second Normal Form (2NF)​

Third Normal Form (3NF)​

2. Primary Key Strategies​

UUID Primary Keys (Recommended)​

Auto-incrementing Integers​

Common Design Patterns​

1. User Management Pattern​

2. Hierarchical Data Pattern​

3. Audit Trail Pattern​

4. Soft Delete Pattern​

Performance Optimization​

1. Strategic Indexing​

2. Query Optimization Patterns​

3. Partitioning for Large Tables​

Data Types and Constraints​

1. Appropriate Data Type Selection​

2. Custom Data Types​

Migration Strategies​

1. Schema Versioning​

2. Safe Schema Changes​

Security Best Practices​

1. Access Control​

2. Data Validation​

Integration with GISE Phases​

Design Phase​

Development Phase​

Testing Phase​

Deploy Phase​

Common Anti-Patterns to Avoid​

1. God Tables​

2. Missing Foreign Key Constraints​

3. Generic Column Names​