Flux vertical scaling

How to configure vertical scaling for Flux controllers

When Flux is managing hundreds of applications that are deployed multiple times per day, cluster admins can fine tune the Flux controller at bootstrap time to run at scale by:

Increase the number of workers and limits

If Flux is managing hundreds of applications, it is advised to increase the number of reconciliations that can be performed in parallel and to bump the resources limits:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --concurrent=10
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --requeue-dependency=5s      
    target:
      kind: Deployment
      name: "(kustomize-controller|helm-controller|source-controller)"
  - patch: |
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: all
      spec:
        template:
          spec:
            containers:
              - name: manager
                resources:
                  limits:
                    cpu: 2000m
                    memory: 2Gi      
    target:
      kind: Deployment
      name: "(kustomize-controller|helm-controller|source-controller)"

Enable in-memory kustomize builds

When increasing the number of concurrent reconciliations, it is advised to use tmpfs for the /tmp filesystem to speed up the Flux kustomize build operations:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --concurrent=20
      - op: replace
        path: /spec/template/spec/volumes/0
        value:
          name: temp
          emptyDir:
            medium: Memory      
    target:
      kind: Deployment
      name: kustomize-controller

Enable Helm repositories caching

If Flux connects to Helm repositories hosting hundreds of Helm charts, it is advised to enable caching to reduce the memory footprint of source-controller:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --helm-cache-max-size=10
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --helm-cache-ttl=60m
      - op: add
        path: /spec/template/spec/containers/0/args/-
        value: --helm-cache-purge-interval=5m      
    target:
      kind: Deployment
      name: source-controller

When helm-cache-max-size is reached, an error is logged and the index is instead read from file. Cache hits are exposed via the gotk_cache_events_total Prometheus metrics. Use this data to fine-tune the configuration flags.

Persistent storage for Flux internal artifacts

Flux maintains a local cache of artifacts acquired from external sources. By default, the cache is stored in an EmptyDir volume, which means that after a restart, Flux has to restore the local cache by fetching the content of all Git repositories, Buckets, Helm charts and OCI artifacts. To avoid losing the cached artifacts, you can configure source-controller with a persistent volume.

Create a Kubernetes PVC definition named gotk-pvc.yaml and place it in your flux-system directory:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: gotk-pvc
  namespace: flux-system
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard
  resources:
    requests:
      storage: 10Gi

Add the PVC file to the kustomization.yaml resources and patch the source-controller volumes:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
  - gotk-pvc.yaml
patches:
  - patch: |
      - op: add
        path: '/spec/template/spec/volumes/-'
        value:
          name: persistent-data
          persistentVolumeClaim:
            claimName: gotk-pvc
      - op: replace
        path: '/spec/template/spec/containers/0/volumeMounts/0'
        value:
          name: persistent-data
          mountPath: /data      
    target:
      kind: Deployment
      name: source-controller

Node affinity and tolerations

Pin the Flux controller pods to specific nodes and allow the cluster autoscaler to evict them:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - gotk-components.yaml
  - gotk-sync.yaml
patches:
  - patch: |
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: all
      spec:
        template:
          metadata:
            annotations:
              cluster-autoscaler.kubernetes.io/safe-to-evict: "true"
          spec:
            affinity:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: role
                          operator: In
                          values:
                            - flux
            tolerations:
              - effect: NoSchedule
                key: role
                operator: Equal
                value: flux      
    target:
      kind: Deployment
      labelSelector: app.kubernetes.io/part-of=flux

The above configuration pins Flux to nodes tainted and labeled with:

kubectl taint nodes <my-node> role=flux:NoSchedule
kubectl label nodes <my-node> role=flux