TL;DR: CI is the easy part — tests, builds, linting, done. CD is where things get interesting. I’ve used three approaches: pushing directly from CI with Helm, pulling with GitOps (ArgoCD/Flux), and progressive delivery with Argo Rollouts. Each has real trade-offs. Here’s what I learned running all three in production.


The Real Problem: Delivery, Not Integration

Every CI/CD conversation starts with the pipeline. Tests pass, Docker image gets built, linter’s happy — great. That part is basically solved. Whether you’re using GitHub Actions, GitLab CI, Jenkins, or whatever, the CI side looks roughly the same everywhere.

The CD part? That’s where the disagreements start. And honestly, where most of the production incidents come from.

I’ve run all three major patterns in production environments. None of them are universally “the right answer.” They all have sharp edges. The question is which set of trade-offs you can live with.

CI: The Constant

Before we get into the CD debate, let’s acknowledge what stays the same regardless of your delivery strategy:

flowchart LR
    Code["Code Push"] --> Lint["Lint &<br/>Format"]
    Lint --> Test["Unit &<br/>Integration Tests"]
    Test --> Build["Build Docker<br/>Image"]
    Build --> Push["Push to<br/>Registry"]

    style Code fill:#339af0,stroke:#1971c2,color:#fff
    style Push fill:#51cf66,stroke:#2f9e44

Your CI pipeline runs on every push or PR:

  1. Lint and format — catch style issues before they become arguments in code review
  2. Tests — unit tests, integration tests, whatever your codebase needs
  3. Build — produce a Docker image tagged with the commit SHA
  4. Push — shove it into your container registry (ECR, GCR, Docker Hub, wherever)

After this point, you have a tested, built image sitting in a registry. Now the question is: how does that image get into your cluster?


Option 1: Push-Based — Helm from CI

This is the “just deploy it” approach. Your CI pipeline finishes building the image and immediately pushes the new version into the cluster using Helm, kubectl, or whatever deployment tool you prefer.

How It Works

flowchart LR
    CI["CI Runner"] --> Auth["Authenticate<br/>to Cluster"]
    Auth --> Deploy["helm upgrade<br/>--install --atomic"]
    Deploy --> |Success| Done["Done"]
    Deploy --> |Failure| Rollback["Automatic<br/>Rollback"]

    style CI fill:#339af0,stroke:#1971c2,color:#fff
    style Deploy fill:#ff922b,stroke:#e8590c,color:#fff
    style Done fill:#51cf66,stroke:#2f9e44
    style Rollback fill:#ff6b6b,stroke:#c92a2a,color:#fff

Here’s what a typical push-based deploy script looks like. This one handles a common annoyance with private EKS clusters — the CI runner’s IP needs to be whitelisted before it can talk to the API server:

#!/bin/bash
set -euo pipefail

# Get the CI runner's public IP and whitelist it
RUNNER_IP=$(curl -s ifconfig.me)
RUNNER_CIDR="$RUNNER_IP/32"

# Fetch existing authorized CIDRs
EXISTING_CIDRS=$(aws eks describe-cluster \
  --name "$CLUSTER_NAME" \
  --region "$AWS_REGION" \
  --query "cluster.resourcesVpcConfig.publicAccessCidrs" \
  --output json)

# Add runner CIDR to the authorized list
UPDATED_CIDRS=$(echo "$EXISTING_CIDRS" | jq --arg cidr "$RUNNER_CIDR" '. + [$cidr] | unique')

aws eks update-cluster-config \
  --name "$CLUSTER_NAME" \
  --region "$AWS_REGION" \
  --resources-vpc-config "{\"publicAccessCidrs\": $UPDATED_CIDRS}"

# Configure kubectl
aws eks update-kubeconfig \
  --region "$AWS_REGION" \
  --name "$CLUSTER_NAME"

# Deploy with atomic — rolls back automatically on failure
helm upgrade --install --atomic \
  -n "$NAMESPACE" \
  --set image.repository="$REGISTRY/$IMAGE_NAME" \
  --set image.tag="$COMMIT_SHA" \
  "$RELEASE_NAME" ./charts

# Clean up — remove runner IP from whitelist
CLEANED_CIDRS=$(echo "$UPDATED_CIDRS" | jq --arg cidr "$RUNNER_CIDR" '. - [$cidr]')

aws eks update-cluster-config \
  --name "$CLUSTER_NAME" \
  --region "$AWS_REGION" \
  --resources-vpc-config "{\"publicAccessCidrs\": $CLEANED_CIDRS}"

The --atomic flag on helm upgrade is doing the heavy lifting here. If the deployment fails — pods crash, health checks don’t pass, whatever — Helm automatically rolls back to the previous release. No manual intervention needed.

The Good

  • Simple to understand: one script, one pipeline, one place to look when things break
  • Immediate feedback: you know within minutes if the deploy worked or not
  • Built-in rollback: --atomic handles failures automatically
  • No extra infrastructure: no GitOps operator to maintain, no sync loops to debug

The Bad

  • CI needs cluster credentials: your CI runner needs IAM roles, kubeconfig access, and network connectivity to the cluster. That’s a lot of surface area
  • IP whitelisting is fragile: CI runners often have dynamic IPs. The whitelist-deploy-cleanup dance works but adds failure modes. If the script dies mid-deploy, you’ve left a stale IP in your cluster’s access list
  • No audit trail beyond CI logs: once the deploy happens, the only record is in your CI pipeline logs. There’s no declarative state of “what should be running” versus “what is running”
  • Tight coupling: CI and CD are the same system. If your CI provider has an outage, you can’t deploy. If you need to roll back, you either re-run a pipeline or do it manually

Warning: Granting CI runners direct access to your Kubernetes API is a security trade-off. If your CI system is compromised, an attacker has a direct path to your cluster. Make sure the IAM role used by CI has the minimum permissions needed — ideally scoped to a single namespace.


Option 2: Pull-Based — GitOps with ArgoCD or Flux

Instead of pushing deployments from CI, you let the cluster pull its own desired state from a Git repository. CI’s only job is to build the image and update a reference somewhere. The GitOps operator running inside the cluster does the actual deployment.

There are two flavors of this:

2a: CI Updates the Git Repo

CI builds the image, then commits an updated image tag to a Git repo that ArgoCD or Flux watches:

flowchart LR
    CI["CI Pipeline"] --> Build["Build & Push<br/>Image"]
    Build --> Update["Update image tag<br/>in Git repo"]
    Update --> Git["Git Repo<br/>(values.yaml)"]
    Git --> Argo["ArgoCD / Flux"]
    Argo --> Cluster["Kubernetes<br/>Cluster"]

    style CI fill:#339af0,stroke:#1971c2,color:#fff
    style Git fill:#ff922b,stroke:#e8590c,color:#fff
    style Argo fill:#845ef7,stroke:#7048e8,color:#fff
    style Cluster fill:#51cf66,stroke:#2f9e44

The CI step that updates the repo typically looks something like:

# GitHub Actions example
- name: Update image tag
  run: |
    git clone https://github.com/org/k8s-manifests.git
    cd k8s-manifests
    yq -i '.image.tag = "${{ github.sha }}"' apps/myapp/values.yaml
    git add .
    git commit -m "deploy: myapp ${{ github.sha }}"
    git push

ArgoCD detects the change, compares it to what’s running, and syncs the cluster to match.

2b: ArgoCD Image Updater

Instead of CI updating the repo, the ArgoCD Image Updater watches your container registry and automatically updates the image tag when a new one appears:

flowchart LR
    CI["CI Pipeline"] --> Build["Build & Push<br/>Image"]
    Build --> Registry["Container<br/>Registry"]
    Updater["ArgoCD Image<br/>Updater"] --> |"Watch for<br/>new tags"| Registry
    Updater --> |"Update tag"| Git["Git Repo"]
    Git --> Argo["ArgoCD"]
    Argo --> Cluster["Kubernetes<br/>Cluster"]

    style CI fill:#339af0,stroke:#1971c2,color:#fff
    style Registry fill:#ff922b,stroke:#e8590c,color:#fff
    style Updater fill:#845ef7,stroke:#7048e8,color:#fff
    style Cluster fill:#51cf66,stroke:#2f9e44

You configure it with annotations on your ArgoCD Application:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  annotations:
    argocd-image-updater.argoproj.io/image-list: myapp=<registry>/<image>
    argocd-image-updater.argoproj.io/myapp.update-strategy: semver
    argocd-image-updater.argoproj.io/myapp.allow-tags: regexp:^v[0-9]+\.[0-9]+\.[0-9]+$

The Image Updater polls the registry, finds tags matching your regex, and updates the Application to use the latest matching tag. You can use different strategies — semver to follow semantic versioning, latest to always grab the newest, or digest to track a mutable tag by SHA.

The Good

  • Git is the source of truth: you can always git log to see exactly what changed and when. Want to know what’s running in production? Check the repo
  • No cluster credentials in CI: CI pushes an image and updates a file. It never touches the cluster
  • Self-healing: if someone manually changes something in the cluster, the GitOps operator reverts it to match the declared state
  • Natural audit trail: every deployment is a git commit with a timestamp, author, and diff
  • Rollback is git revert: no special tooling needed

The Bad

  • More infrastructure to manage: you’re running ArgoCD or Flux inside your cluster. That’s another thing to monitor, upgrade, and debug
  • Eventual consistency: there’s a delay between the git commit and the actual deployment. Usually seconds to minutes, but it’s not instantaneous
  • Sync debugging can be painful: when ArgoCD says “OutOfSync” and you can’t figure out why, you’ll miss the simplicity of helm upgrade
  • Image Updater has quirks: it sometimes struggles with private registries, rate limiting, and complex tag patterns. And if it auto-deploys a bad image, you’re chasing it after the fact

Option 3: Progressive Delivery — Canary and Blue/Green

Both push and pull methods deploy the new version all at once. Progressive delivery takes a more cautious approach — gradually shifting traffic to the new version while monitoring for errors.

How It Works

flowchart TB
    subgraph Cluster["Kubernetes Cluster"]
        Ingress["Ingress / Service Mesh"]
        subgraph Stable["Stable (v1)"]
            S1["Pod"]
            S2["Pod"]
            S3["Pod"]
        end
        subgraph Canary["Canary (v2)"]
            C1["Pod"]
        end
        Ingress --> |"90% traffic"| Stable
        Ingress --> |"10% traffic"| Canary
    end

    style Stable fill:#51cf66,stroke:#2f9e44
    style Canary fill:#ff922b,stroke:#e8590c,color:#fff
    style Ingress fill:#339af0,stroke:#1971c2,color:#fff

Tools like Argo Rollouts replace the standard Kubernetes Deployment with a Rollout resource that supports:

  • Canary: shift a percentage of traffic to the new version, analyze metrics, increase the percentage, repeat
  • Blue/Green: spin up the new version alongside the old one, run smoke tests, then switch all traffic at once
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  strategy:
    canary:
      steps:
        - setWeight: 10
        - pause: { duration: 5m }
        - setWeight: 30
        - pause: { duration: 5m }
        - setWeight: 60
        - pause: { duration: 5m }
        - setWeight: 100
      canaryService: myapp-canary
      stableService: myapp-stable
      trafficRouting:
        nginx:
          stableIngress: myapp-ingress

This rolls out gradually: 10% traffic for 5 minutes, then 30%, then 60%, then full rollout. At any step, if metrics look bad, the rollout can automatically abort and shift all traffic back to stable.

You can also wire up automated analysis with tools like Prometheus:

analysis:
  templates:
    - templateName: success-rate
  args:
    - name: service-name
      value: myapp
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  metrics:
    - name: success-rate
      interval: 60s
      successCondition: result[0] >= 0.95
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status=~"2.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

If the success rate drops below 95% during the canary phase, the rollout automatically aborts. No human intervention needed.

You don’t necessarily need Argo Rollouts for this — service meshes like Istio and Linkerd, and ingress controllers like NGINX and Traefik, can handle weighted traffic routing. Argo Rollouts just wraps it into a Kubernetes-native workflow with built-in analysis. You could also write your own controller, but these are battle-tested and it’s generally not worth reinventing.

The Good

  • Safest for production: a bad deploy only affects a fraction of traffic before being caught
  • Automated analysis: can automatically compare error rates, latency, and success rates between canary and stable
  • Gradual rollout: gives you time to catch subtle issues that wouldn’t show up in pre-deploy tests
  • Works with GitOps: Argo Rollouts integrates cleanly with ArgoCD. You get the benefits of both

The Bad

  • Most complex to set up: requires a traffic routing layer (ingress controller or service mesh), the Rollouts controller, and analysis infrastructure (Prometheus, Datadog, etc.)
  • Overkill for simple apps: if your app is a basic CRUD service with 10 RPM, a canary analysis watching for error rate changes is going to be noisy and slow
  • Debugging is harder: when a rollout is stuck at 30% and the analysis is inconclusive, figuring out what’s wrong requires understanding the Rollout resource, the analysis template, the metrics pipeline, and the traffic routing — all at once
  • Requires observable services: if your app doesn’t emit meaningful metrics, there’s nothing to analyze. Progressive delivery without good observability is just a slow deployment

The Comparison

Push (Helm from CI)Pull (GitOps)Progressive Delivery
ComplexityLowMediumHigh
Rollback--atomic / re-run pipelinegit revertAutomatic abort
Audit trailCI logsGit historyGit history + rollout events
Cluster credentials in CIYesNoNo (with GitOps)
Deploy speedImmediateSeconds to minutesMinutes to hours
Blast radius on failureFullFullPartial (canary %)
Extra infrastructureNoneArgoCD/FluxArgoCD + Rollouts + metrics
Best forSmall teams, simple appsMost production workloadsHigh-traffic, critical services

What I Actually Use

In practice, I usually end up with a combination.

For most workloads: GitOps with ArgoCD (Option 2a). CI builds the image, updates the tag in a manifests repo, ArgoCD syncs. It’s the right balance of safety and simplicity for teams of any size. The audit trail alone makes it worth the overhead.

For early-stage or internal tools: push-based with helm upgrade --atomic. When you’re iterating fast and the blast radius is low, the simplicity wins. Just know what you’re trading off.

For critical production services: GitOps + Argo Rollouts. If the service handles real user traffic and a bad deploy means revenue loss or customer impact, the canary analysis is worth the setup complexity. But only if you have the observability to back it up — progressive delivery without good metrics is just a slow way to break things.

The one thing I’d always avoid: staying on push-based once your team or infrastructure grows past the “we all know what’s happening” stage. The lack of audit trail and the CI-to-cluster credential chain become real liabilities at scale.

Wrapping Up

There’s no single right answer for Kubernetes CD. The pattern that works depends on your team size, risk tolerance, and how much infrastructure you’re willing to maintain. Start simple, add complexity only when you feel the pain.

The progression I’ve seen work for most teams:

  1. Start with push: get something working, ship fast, iterate
  2. Move to GitOps: once you need audit trails, multi-environment management, or your team grows beyond “everyone knows the deploy script”
  3. Add progressive delivery: when the cost of a bad deploy outweighs the cost of the extra infrastructure

Each step adds complexity, but also adds safety nets. Just make sure you’re adding them because you need them, not because a conference talk made them look cool.