Tutorial

Clean Code Python: Configuration, Feature Flags, and Secrets That Scale

Hardcoded config breaks the moment your 200 tenants need different rate limits, feature access, and API keys. Here is a hierarchical configuration system with per-tenant overrides, percentage-based feature rollouts, and encrypted secrets — all hot-reloadable without restarts.

Tin Dang avatar
Tin Dang
Control panel with multiple switches dials and gauges representing system configuration layers

Your ShelfWise platform has 200 tenants. Tenant A needs a rate limit of 1,000 requests per minute. Tenant B, your largest customer, negotiated 10,000. Tenant C is beta-testing your new AI recommendations feature. Tenant D must never see it — their contract explicitly excludes experimental features.

If your configuration lives in environment variables and your feature flags are if statements, you are about to learn why that does not scale. Every change requires a redeploy. Every tenant-specific behavior is a hardcoded branch. Every secret is a plaintext string that one docker inspect exposes.

This post builds a configuration system that solves all three problems: hierarchical config with per-tenant overrides, a feature flag system with percentage-based rollouts, and encrypted secrets storage — all hot-reloadable without restarting the application.

The Configuration Hierarchy

Configuration in a multi-tenant system is not a flat key-value store. It is a hierarchy with clear precedence rules:

A request for “rate_limit” resolves like this: check the tenant override first. If the tenant has a custom rate limit, use it. If not, check the environment variable. If that is not set either, fall back to the application default. The highest-specificity value wins.

Base Settings with Pydantic

The application defaults and environment variable layer use Pydantic’s BaseSettings. This gives you type-safe configuration with validation, nested models, and automatic environment variable parsing — no os.getenv() calls scattered across the codebase.

src/core/config.py
from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
class DatabaseSettings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="DB_")
url: SecretStr = Field(
default=SecretStr("postgresql+asyncpg://localhost:5432/shelfwise"),
)
pool_size: int = Field(default=20, ge=1, le=100)
max_overflow: int = Field(default=30, ge=0, le=200)
pool_timeout: int = Field(default=30, ge=5)
echo: bool = False
class RateLimitSettings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="RATE_LIMIT_")
requests_per_minute: int = Field(default=100, ge=1)
burst_size: int = Field(default=20, ge=1)
class AISettings(BaseSettings):
model_config = SettingsConfigDict(env_prefix="AI_")
model_name: str = "gpt-4o-mini"
max_tokens: int = Field(default=2048, ge=1, le=16384)
temperature: float = Field(default=0.7, ge=0.0, le=2.0)
enabled: bool = False
class AppSettings(BaseSettings):
"""Root settings — assembled from nested models.
Environment variables use double-underscore for nesting:
DB__POOL_SIZE=30, RATE_LIMIT__REQUESTS_PER_MINUTE=500
"""
model_config = SettingsConfigDict(
env_file=".env",
env_file_encoding="utf-8",
env_nested_delimiter="__",
case_sensitive=False,
)
app_name: str = "ShelfWise"
debug: bool = False
database: DatabaseSettings = DatabaseSettings()
rate_limit: RateLimitSettings = RateLimitSettings()
ai: AISettings = AISettings()
# Singleton — loaded once at startup
settings = AppSettings()

The env_nested_delimiter="__" is the key to avoiding a flat namespace of dozens of SHELFWISE_DB_POOL_SIZE, SHELFWISE_RATE_LIMIT_RPM variables. Instead, the nested model structure maps directly to environment variables: DB__POOL_SIZE=30 sets settings.database.pool_size.

Per-Tenant Configuration Overrides

The base settings handle application-wide defaults. Per-tenant overrides live in the database, cached in Redis, and resolved at request time.

src/models/tenant_config.py
from uuid import UUID
from sqlalchemy import UniqueConstraint
from sqlalchemy.orm import Mapped, mapped_column
from src.db.base import Base
class TenantConfig(Base):
"""Per-tenant configuration overrides.
Keys follow dot notation: "rate_limit.requests_per_minute", "ai.enabled".
Values are stored as strings and coerced to the target type at resolution time.
"""
__tablename__ = "tenant_configs"
__table_args__ = (
UniqueConstraint("tenant_id", "key", name="uq_tenant_config"),
)
id: Mapped[int] = mapped_column(primary_key=True)
tenant_id: Mapped[UUID] = mapped_column(nullable=False, index=True)
key: Mapped[str] = mapped_column(nullable=False)
value: Mapped[str] = mapped_column(nullable=False)

The resolver walks the hierarchy:

src/core/config_resolver.py
from typing import TypeVar, overload
from uuid import UUID
import redis.asyncio as redis
from src.core.config import settings
T = TypeVar("T", str, int, float, bool)
# Redis client — initialized at startup
_redis: redis.Redis | None = None
def init_redis(client: redis.Redis) -> None:
global _redis
_redis = client
def _cache_key(tenant_id: UUID, key: str) -> str:
return f"tenant_config:{tenant_id}:{key}"
async def get_tenant_config(
tenant_id: UUID,
key: str,
target_type: type[T],
session=None,
) -> T:
"""Resolve a config value through the hierarchy.
1. Check Redis cache for tenant override
2. If cache miss, check database
3. If no tenant override, fall back to app settings
"""
# Layer 1: Redis cache
if _redis is not None:
cached = await _redis.get(_cache_key(tenant_id, key))
if cached is not None:
return _coerce(cached.decode(), target_type)
# Layer 2: Database lookup
if session is not None:
from sqlalchemy import select
from src.models.tenant_config import TenantConfig
result = await session.execute(
select(TenantConfig.value)
.where(TenantConfig.tenant_id == tenant_id)
.where(TenantConfig.key == key)
)
row = result.scalar_one_or_none()
if row is not None:
# Populate cache for next time (TTL: 5 minutes)
if _redis is not None:
await _redis.set(
_cache_key(tenant_id, key), row, ex=300
)
return _coerce(row, target_type)
# Layer 3: Application defaults
return _resolve_default(key, target_type)
def _coerce(value: str, target_type: type[T]) -> T:
"""Coerce a string value to the target type."""
if target_type is bool:
return target_type(value.lower() in ("true", "1", "yes"))
return target_type(value)
def _resolve_default(key: str, target_type: type[T]) -> T:
"""Walk dot-notation key through the settings object."""
obj = settings
for part in key.split("."):
obj = getattr(obj, part)
return target_type(obj)

Usage in a service:

src/services/catalog_service.py
from src.core.config_resolver import get_tenant_config
from src.core.context import get_current_tenant_id
async def get_rate_limit() -> int:
"""Get the rate limit for the current tenant."""
return await get_tenant_config(
tenant_id=get_current_tenant_id(),
key="rate_limit.requests_per_minute",
target_type=int,
)

Cache Invalidation

When an admin updates a tenant’s configuration, the cache must be invalidated immediately. Not on the next TTL expiry — immediately. A tenant paying for a higher rate limit should not wait 5 minutes for it to take effect.

src/services/admin_config_service.py
from uuid import UUID
from sqlalchemy.ext.asyncio import AsyncSession
from src.core.config_resolver import _redis, _cache_key
from src.models.tenant_config import TenantConfig
async def update_tenant_config(
session: AsyncSession,
tenant_id: UUID,
key: str,
value: str,
) -> None:
"""Update a tenant config override and invalidate the cache."""
from sqlalchemy import select
from sqlalchemy.dialects.postgresql import insert
stmt = insert(TenantConfig).values(
tenant_id=tenant_id, key=key, value=value
).on_conflict_do_update(
constraint="uq_tenant_config",
set_={"value": value},
)
await session.execute(stmt)
await session.commit()
# Invalidate cache immediately
if _redis is not None:
await _redis.delete(_cache_key(tenant_id, key))

Feature Flags with Percentage Rollouts

Feature flags are configuration with behavior. A flag is either on or off for a given tenant, and the rollout percentage controls how many tenants see it.

ShelfWise is rolling out “AI book recommendations.” The rollout plan: 5% of tenants this week, 25% next week, 100% after validation. No deploys required.

src/core/feature_flags.py
from enum import StrEnum
from uuid import UUID
import hashlib
from src.core.config_resolver import get_tenant_config
class FeatureFlag(StrEnum):
"""Feature flags for ShelfWise.
Add new flags here. Remove dead flags aggressively —
a flag that is 100% rolled out is not a flag, it is a feature.
"""
AI_RECOMMENDATIONS = "ai_recommendations"
BULK_IMPORT_V2 = "bulk_import_v2"
ADVANCED_ANALYTICS = "advanced_analytics"
PUBLISHER_PORTAL = "publisher_portal"
async def is_enabled(flag: FeatureFlag, tenant_id: UUID) -> bool:
"""Check if a feature flag is enabled for a specific tenant.
Resolution order:
1. Explicit tenant override ("enabled" or "disabled") — always wins
2. Percentage-based rollout — deterministic hash of tenant_id + flag
"""
# Check for explicit override
override = await _get_flag_override(flag, tenant_id)
if override is not None:
return override
# Percentage-based rollout
rollout_pct = await _get_rollout_percentage(flag)
if rollout_pct <= 0:
return False
if rollout_pct >= 100:
return True
# Deterministic: same tenant + flag always gets the same result
# until the percentage changes
return _hash_into_bucket(tenant_id, flag) < rollout_pct
def _hash_into_bucket(tenant_id: UUID, flag: FeatureFlag) -> int:
"""Hash tenant_id + flag name into a 0-99 bucket.
Deterministic: the same tenant always lands in the same bucket.
Uniform: tenants are evenly distributed across buckets.
"""
raw = f"{tenant_id}:{flag.value}".encode()
digest = hashlib.sha256(raw).hexdigest()
return int(digest[:8], 16) % 100
async def _get_flag_override(
flag: FeatureFlag, tenant_id: UUID
) -> bool | None:
"""Check if tenant has an explicit flag override."""
try:
value = await get_tenant_config(
tenant_id=tenant_id,
key=f"feature.{flag.value}.override",
target_type=str,
)
if value == "enabled":
return True
if value == "disabled":
return False
except (AttributeError, KeyError):
pass
return None
async def _get_rollout_percentage(flag: FeatureFlag) -> int:
"""Get the current rollout percentage for a flag (0-100)."""
try:
return await get_tenant_config(
tenant_id=UUID(int=0), # global config, not tenant-specific
key=f"feature.{flag.value}.rollout_pct",
target_type=int,
)
except (AttributeError, KeyError):
return 0 # Not rolled out by default

Usage in a route handler:

src/api/v1/recommendations.py
from fastapi import APIRouter, HTTPException
from src.core.feature_flags import FeatureFlag, is_enabled
from src.core.context import get_current_tenant_id
router = APIRouter()
@router.get("/books/{book_id}/recommendations")
async def get_recommendations(book_id: int):
tenant_id = get_current_tenant_id()
if not await is_enabled(FeatureFlag.AI_RECOMMENDATIONS, tenant_id):
raise HTTPException(
status_code=404,
detail="This feature is not available for your plan.",
)
return await generate_recommendations(book_id, tenant_id)

The rollout lifecycle for AI recommendations:

# Week 1: Enable for 5% of tenants
await update_tenant_config(session, UUID(int=0), "feature.ai_recommendations.rollout_pct", "5")
# Week 2: Ramp to 25%
await update_tenant_config(session, UUID(int=0), "feature.ai_recommendations.rollout_pct", "25")
# Week 3: Full rollout
await update_tenant_config(session, UUID(int=0), "feature.ai_recommendations.rollout_pct", "100")
# Force-enable for a specific beta tenant regardless of percentage
await update_tenant_config(session, beta_tenant_id, "feature.ai_recommendations.override", "enabled")
# Force-disable for a tenant whose contract excludes experimental features
await update_tenant_config(session, restricted_tenant_id, "feature.ai_recommendations.override", "disabled")

Encrypted Secrets Storage

Some tenants bring their own API keys — a publisher who wants ShelfWise to push inventory updates to their existing ERP system. These secrets cannot be stored in plaintext. A database dump, a log leak, or a support engineer with read access should never expose a tenant’s API key.

src/core/secrets.py
from cryptography.fernet import Fernet
from pydantic import SecretStr
from src.core.config import settings
# The encryption key is the ONE secret that lives in environment variables.
# Everything else is encrypted with it and stored in the database.
_fernet: Fernet | None = None
def init_encryption(key: str) -> None:
"""Initialize encryption with the master key from environment."""
global _fernet
_fernet = Fernet(key.encode())
def encrypt_secret(plaintext: str) -> str:
"""Encrypt a secret for database storage."""
if _fernet is None:
raise RuntimeError("Encryption not initialized. Call init_encryption() at startup.")
return _fernet.encrypt(plaintext.encode()).decode()
def decrypt_secret(ciphertext: str) -> SecretStr:
"""Decrypt a secret from database storage.
Returns SecretStr to prevent accidental logging of the plaintext.
"""
if _fernet is None:
raise RuntimeError("Encryption not initialized. Call init_encryption() at startup.")
plaintext = _fernet.decrypt(ciphertext.encode()).decode()
return SecretStr(plaintext)
ApproachEnv Vars OnlyDB + Fernet EncryptionExternal Vault (HashiCorp/AWS)
Per-tenant secrets No — one value per key Yes — stored per tenant_id Yes — path-per-tenant
Rotation Redeploy required Update DB row, invalidate cache Vault handles rotation
Access control Anyone with shell access App-level (decrypt requires master key) Policy-based (IAM, ACL)
Audit trail None DB audit log Full audit log
Operational cost Zero Low (one master key to manage) High (Vault cluster)
Best for Single-tenant, < 10 secrets Multi-tenant, 10-500 secrets Regulated, 500+ secrets

ShelfWise uses Fernet encryption for tenant secrets. The master encryption key is the only secret in environment variables. When a tenant provides their ERP API key, it is encrypted before it hits the database:

src/services/integration_service.py
from uuid import UUID
from sqlalchemy.ext.asyncio import AsyncSession
from src.core.secrets import encrypt_secret, decrypt_secret
from src.models.tenant_config import TenantConfig
async def store_tenant_api_key(
session: AsyncSession,
tenant_id: UUID,
service_name: str,
api_key: str,
) -> None:
"""Store an encrypted API key for a tenant's external integration."""
encrypted = encrypt_secret(api_key)
config = TenantConfig(
tenant_id=tenant_id,
key=f"secret.{service_name}.api_key",
value=encrypted,
)
session.add(config)
await session.commit()
async def get_tenant_api_key(
session: AsyncSession,
tenant_id: UUID,
service_name: str,
) -> str:
"""Retrieve and decrypt a tenant's API key."""
from sqlalchemy import select
result = await session.execute(
select(TenantConfig.value)
.where(TenantConfig.tenant_id == tenant_id)
.where(TenantConfig.key == f"secret.{service_name}.api_key")
)
encrypted = result.scalar_one_or_none()
if encrypted is None:
raise ValueError(f"No API key configured for {service_name}")
return decrypt_secret(encrypted).get_secret_value()

Hot-Reloading Configuration Without Restarts

The Redis cache with TTL handles most hot-reload needs — update the database, invalidate the cache, and the next request picks up the new value. But for application-wide settings (like changing the default rate limit), you need a mechanism that does not require cycling every application instance.

src/core/config_watcher.py
import asyncio
import logging
import redis.asyncio as redis
logger = logging.getLogger("shelfwise.config")
async def watch_config_changes(redis_client: redis.Redis) -> None:
"""Subscribe to config change events via Redis Pub/Sub.
When an admin updates a config value, the admin service publishes
a message to "config:changed". Every app instance receives it
and invalidates its local cache.
"""
pubsub = redis_client.pubsub()
await pubsub.subscribe("config:changed")
async for message in pubsub.listen():
if message["type"] == "message":
key = message["data"].decode()
logger.info("Config changed: %s — clearing local cache", key)
# Clear any in-process caches (e.g., functools.lru_cache)
_clear_local_caches(key)
async def publish_config_change(
redis_client: redis.Redis, key: str
) -> None:
"""Notify all app instances that a config key has changed."""
await redis_client.publish("config:changed", key)
def _clear_local_caches(key: str) -> None:
"""Clear in-process caches for the changed key.
In practice, you maintain a registry of cached config accessors
and clear only the affected ones.
"""
# Implementation depends on your caching strategy.
# For functools.cache, call the .cache_clear() method on the decorated function.
pass

The publish step is added to the admin config update:

# Updated update_tenant_config
async def update_tenant_config(
session: AsyncSession,
tenant_id: UUID,
key: str,
value: str,
) -> None:
# ... insert/update logic ...
await session.commit()
if _redis is not None:
await _redis.delete(_cache_key(tenant_id, key))
await publish_config_change(_redis, f"{tenant_id}:{key}")

Start the watcher as a background task at application startup:

src/main.py
from contextlib import asynccontextmanager
from collections.abc import AsyncIterator
from fastapi import FastAPI
from src.core.config_watcher import watch_config_changes
@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
# Start config watcher in background
watcher_task = asyncio.create_task(
watch_config_changes(redis_client)
)
yield
watcher_task.cancel()
app = FastAPI(lifespan=lifespan)

Putting It All Together: The ShelfWise Rollout

Here is the complete flow for rolling out AI recommendations to ShelfWise tenants:

  1. Deploy with AI__ENABLED=false — the feature code ships but is dormant.
  2. Set rollout percentage to 5% via admin API — feature.ai_recommendations.rollout_pct = "5". No deploy needed. 10 of your 200 tenants now see the recommendations endpoint.
  3. Force-enable for your beta partnerfeature.ai_recommendations.override = "enabled" for that specific tenant. They see it regardless of the percentage.
  4. Monitor error rates and latency for the 5% cohort.
  5. Ramp to 25%, then 50%, then 100% — each step is a single config update, zero deploys.
  6. Remove the feature flag — once at 100% and stable, delete the flag check from the code. A flag that is always on is dead code.

The entire rollout happens through configuration changes. The application binary does not change. The deployment pipeline does not run. The risk surface is a single database row, not a full release cycle.

Key Takeaways

  • Pydantic BaseSettings with nested models gives you typed, validated configuration with environment variable parsing. No more os.getenv() with manual type coercion and missing-key crashes.
  • Three-layer resolution (tenant override, environment, application default) handles both global defaults and per-tenant customization without code branches.
  • Redis caching with explicit invalidation prevents database lookups on every request. TTL provides a safety net; explicit deletion provides immediacy.
  • Deterministic percentage rollouts use SHA-256 hashing so the same tenant always gets the same result. No flickering, no lost features during ramp-up.
  • Fernet encryption for tenant secrets keeps API keys encrypted at rest. The master key is the only secret in environment variables — everything else is encrypted in the database.
  • Redis Pub/Sub for hot-reload notifies all application instances when configuration changes. No restarts, no deploy cycles.

Configuration, feature flags, and secrets are the control plane of a multi-tenant SaaS. Get them right and you can ship features to individual tenants, adjust behavior without deploys, and store third-party credentials without a security incident waiting to happen. Next: the database layer needs to handle the load that 200 tenants generate — connection pooling, read replicas, and circuit breakers.

0

Next in this series

Clean Code Python: Connection Pooling and Database Resilience Under Load

Continue reading