Kubernetes Operators Deep Dive

Kubernetes Operators extend the platform to manage complex applications automatically. While Kubernetes excels at running stateless workloads, stateful applications like databases, message queues, and monitoring systems require domain-specific knowledge for proper operation. Operators encode this knowledge, transforming manual runbooks into automated controllers.

The Operator pattern builds on Kubernetes' fundamental architecture: declare desired state, let controllers reconcile actual state to match. Standard controllers handle built-in resources like Deployments and Services. Operators handle custom resources that represent your specific applications, bringing the same declarative model to complex software.

The Operator Pattern

An Operator consists of Custom Resource Definitions (CRDs) and a controller. CRDs extend the Kubernetes API with new resource types specific to your application. The controller watches these custom resources and takes action to make reality match the declared state.

Consider a PostgreSQL Operator. Instead of manually creating StatefulSets, Services, ConfigMaps, and running backup scripts, you declare a PostgresCluster resource. The Operator handles everything: provisioning instances, configuring replication, managing failover, scheduling backups, and handling upgrades.

This YAML manifest shows what users interact with when using an Operator. You declare what you want, and the Operator figures out how to make it happen.

# Declare what you want
apiVersion: postgres.example.com/v1
kind: PostgresCluster
metadata:
  name: production-db
spec:
  replicas: 3
  version: "15"
  storage:
    size: 100Gi
    class: fast-ssd
  backup:
    schedule: "0 */6 * * *"
    retention: 7d
  resources:
    requests:
      cpu: 2
      memory: 8Gi

The Operator reads this declaration and creates the necessary Kubernetes resources: a StatefulSet for the PostgreSQL pods, Services for client connections and replication, ConfigMaps for PostgreSQL configuration, Secrets for credentials, CronJobs for backups, and monitoring configuration.

More importantly, the Operator handles operational tasks that require PostgreSQL expertise. When a primary fails, it promotes a replica. When you change the version, it performs a rolling upgrade that respects replication lag. When storage fills, it can trigger alerts or automatic expansion.

Building Operators

Operator development frameworks simplify building controllers. Kubebuilder and Operator SDK provide scaffolding, code generation, and libraries that handle Kubernetes API interactions. You focus on your application's domain logic rather than Kubernetes plumbing.

The reconciliation loop is the heart of any Operator. This Go example shows the typical structure where you fetch the custom resource, reconcile each sub-resource, check health, and update status.

// Simplified reconciliation loop structure
func (r *PostgresClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("postgrescluster", req.NamespacedName)

    // Fetch the PostgresCluster resource
    cluster := &postgresv1.PostgresCluster{}
    if err := r.Get(ctx, req.NamespacedName, cluster); err != nil {
        if errors.IsNotFound(err) {
            // Resource deleted, nothing to do
            return ctrl.Result{}, nil
        }
        return ctrl.Result{}, err
    }

    // Reconcile StatefulSet
    if err := r.reconcileStatefulSet(ctx, cluster); err != nil {
        log.Error(err, "Failed to reconcile StatefulSet")
        return ctrl.Result{RequeueAfter: time.Minute}, err
    }

    // Reconcile Services
    if err := r.reconcileServices(ctx, cluster); err != nil {
        log.Error(err, "Failed to reconcile Services")
        return ctrl.Result{RequeueAfter: time.Minute}, err
    }

    // Reconcile backups
    if err := r.reconcileBackups(ctx, cluster); err != nil {
        log.Error(err, "Failed to reconcile Backups")
        return ctrl.Result{RequeueAfter: time.Minute}, err
    }

    // Check cluster health and update status
    health, err := r.checkClusterHealth(ctx, cluster)
    if err != nil {
        return ctrl.Result{RequeueAfter: 30 * time.Second}, err
    }

    cluster.Status.Health = health
    cluster.Status.ReadyReplicas = r.countReadyReplicas(ctx, cluster)
    if err := r.Status().Update(ctx, cluster); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
}

The reconciliation loop is the core of any Operator. It's called whenever the watched resource changes or when a requeue is triggered. The loop compares desired state (the CRD spec) with actual state (what exists in the cluster) and takes action to converge them. Notice how errors trigger a requeue with backoff rather than failing permanently.

Custom Resource Design

Well-designed CRDs provide clear, intuitive APIs for your application. The spec defines desired configuration. The status reports current state. Validation ensures users can't create invalid configurations.

This CRD definition shows how to structure a custom resource with proper validation, versioning, and status fields. The schema prevents users from requesting invalid configurations before the Operator ever sees them.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: postgresclusters.postgres.example.com
spec:
  group: postgres.example.com
  names:
    kind: PostgresCluster
    listKind: PostgresClusterList
    plural: postgresclusters
    singular: postgrescluster
    shortNames:
      - pg
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          required:
            - spec
          properties:
            spec:
              type: object
              required:
                - replicas
                - version
              properties:
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 10
                version:
                  type: string
                  enum: ["13", "14", "15", "16"]
                storage:
                  type: object
                  properties:
                    size:
                      type: string
                      pattern: '^[0-9]+Gi$'
            status:
              type: object
              properties:
                health:
                  type: string
                  enum: ["Healthy", "Degraded", "Unhealthy"]
                readyReplicas:
                  type: integer
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      status:
                        type: string
                      lastTransitionTime:
                        type: string
                        format: date-time

Status conditions follow Kubernetes conventions, providing machine-readable state that tools and other controllers can consume. Conditions like Ready, Available, and Progressing communicate state in a standardized way.

Operational Tasks

Operators shine when handling day-two operations that require domain expertise. These tasks often involve careful sequencing, health checks, and rollback capabilities that would be error-prone to perform manually.

Database upgrades are a prime example. A manual upgrade might involve: check replication lag, stop writes, verify replica caught up, promote replica with new version, reconfigure old primary as replica with new version, restore writes. An Operator encodes this sequence and handles failures at each step.

The following upgrade logic demonstrates how an Operator orchestrates a zero-downtime database version upgrade. Replicas upgrade first, then a failover occurs, and finally the old primary upgrades.

func (r *PostgresClusterReconciler) reconcileUpgrade(ctx context.Context, cluster *postgresv1.PostgresCluster) error {
    currentVersion := cluster.Status.CurrentVersion
    desiredVersion := cluster.Spec.Version

    if currentVersion == desiredVersion {
        return nil
    }

    // Check if upgrade is in progress
    if cluster.Status.UpgradeInProgress {
        return r.continueUpgrade(ctx, cluster)
    }

    // Validate upgrade path
    if !r.isUpgradePathValid(currentVersion, desiredVersion) {
        r.recorder.Eventf(cluster, corev1.EventTypeWarning, "InvalidUpgrade",
            "Cannot upgrade from %s to %s", currentVersion, desiredVersion)
        return fmt.Errorf("invalid upgrade path")
    }

    // Start upgrade
    cluster.Status.UpgradeInProgress = true
    cluster.Status.UpgradeStartedAt = metav1.Now()

    // Upgrade replicas first, then primary
    replicas := r.getReplicas(ctx, cluster)
    for _, replica := range replicas {
        if err := r.upgradeInstance(ctx, replica, desiredVersion); err != nil {
            return err
        }
        // Wait for replica to be healthy before continuing
        if err := r.waitForHealthy(ctx, replica); err != nil {
            return err
        }
    }

    // Failover to upgraded replica, then upgrade old primary
    if err := r.performFailover(ctx, cluster); err != nil {
        return err
    }

    // Upgrade remaining instance (old primary)
    oldPrimary := r.getOldPrimary(ctx, cluster)
    if err := r.upgradeInstance(ctx, oldPrimary, desiredVersion); err != nil {
        return err
    }

    cluster.Status.UpgradeInProgress = false
    cluster.Status.CurrentVersion = desiredVersion
    return nil
}

This sequential approach with health checks at each step ensures the cluster remains available throughout the upgrade. If any step fails, the status flags allow the Operator to resume from where it left off.

Observability Integration

Operators should expose metrics about both themselves and the applications they manage. Prometheus ServiceMonitors, created by the Operator, enable automatic scraping. Custom metrics provide visibility into Operator-specific operations.

Define metrics for reconciliation performance and managed resource health. These metrics help you understand Operator behavior and the state of managed clusters.

var (
    reconcileTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "operator_reconcile_total",
            Help: "Total number of reconciliations",
        },
        []string{"cluster", "result"},
    )

    reconcileDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "operator_reconcile_duration_seconds",
            Help:    "Duration of reconciliation",
            Buckets: prometheus.ExponentialBuckets(0.1, 2, 10),
        },
        []string{"cluster"},
    )

    clusterHealth = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "postgres_cluster_health",
            Help: "Health status of PostgreSQL cluster (1=healthy, 0=unhealthy)",
        },
        []string{"cluster", "namespace"},
    )
)

Alerting rules can reference these metrics, creating alerts for Operator failures, cluster health issues, or backup problems.

When to Build vs Use Existing Operators

The Operator ecosystem is rich. Before building a custom Operator, check if one exists for your application. PostgreSQL has multiple mature Operators (Zalando, CrunchyData, CloudNativePG). Redis, Kafka, Elasticsearch, and most popular stateful applications have community or vendor Operators.

Build a custom Operator when your application is unique to your organization, when existing Operators don't meet your requirements, or when you need tight integration with your specific infrastructure and processes.

Even when using existing Operators, understand their design. Review their CRDs, understand their operational model, and verify they handle your requirements. An Operator that doesn't handle your backup strategy or upgrade path may cause more problems than it solves.

Conclusion

Kubernetes Operators bring the declarative, self-healing model of Kubernetes to complex applications. By encoding operational knowledge in code, they transform manual runbooks into automated controllers. Database clusters, message queues, and monitoring systems that once required careful manual operation can be managed with simple YAML declarations.

Building Operators requires understanding both Kubernetes controller patterns and your application's operational requirements. The reconciliation loop, CRD design, and status reporting follow established patterns. The domain logic; how to safely upgrade, how to handle failures, when to alert; comes from operational experience with your application.