When managing clusters and resources using FluxCD and GitOps, the first challenge any developer will meet is how to structure your GitOps repository.
FluxCD provides docs and a repository with good practices, examples, and thoughts on how to organize your YAML files to set up a multi-tenant Kubernetes cluster. However, with more complex environments where you manage multiple clusters, it is not so clear how to structure and organize your files.
So, what do you do? You search the internet for other people that had the same issues as you. Luckily, you don’t need to search much, because in the previous example repo, you will find fellow developers still looking for the same answers you are looking for.
“Which issues?” you might ask.
The challenge with FluxCD and GitOps is that you can easily end with lots of duplicated configurations making you feel that your code is not DRY at all, especially when trying to keep the configuration as flexible as possible. The culprits are the constraints that Flux and Kustomize enforce in the file structure.
This growth of files to maintain and groom will scale with the amount of apps and clusters you maintain. And this might eventually affect the maintenance and operability of the repo, making your devs slower and less happy.
As a DevOps, my mantra is to make my devs happy. A happy dev means less requests for help to the Platform team, and thus, less work for me (yes, I’m a pretty selfish person 🤓).
In this first post, I’ll propose and explain a repo structure aimed to be as extensible as possible, taking advantage of Kustomization Components to keep our configuration simple, maintainable, and your devs happy.
I’m currently using this setup in production in my homeLab, and you can find a copy of it (without personal configuration and credentials to hack my house) here:
All the code of this post can be found in this repo
First of all, let’s do an overview of a classic GitOps repo, trying to manage multiple clusters, though, in this post, we will focus on the platform and the clusters structures.
We can easily find docs and samples provided by the FluxCD community, but we could summarize it in the following:
├── apps
│ ├── base
│ ├── production
│ └── staging
├── platform
│ ├── base
│ ├── production
│ └── staging
└── clusters
├── production
└── staging
This example gets short when trying to handle real-world situations.
And to keep it simple let’s set the following objectives:
There are many reasons why we need to configure each cluster with a unique configuration:
One clear example is with external-DNS as I need to provide a unique configuration for this cluster:
All this data could be fetched and known before even creating the cluster, but how can we automate providing this information? Not everything is configurable from a HelmChart.
Sometimes, you need to provide Secret and ConfigMap resources in the very namespace where the service will be installed and pass it as a reference. But also, some of these values might be needed in other services (IAM might be needed in cert-manager to access the DNS zone as well).
If we want to handle this at cluster initialization time, we will also start thinking “In which namespace do these Secret and ConfigMap resources need to be created?” We do not know yet. Depending on the variant of the platform, we are installing some namespaces that might exist and others not!
Does this mean that I need a profile for each distinct unique configuration per cluster?
Here, we reach the compromise of standardization: we want to make all clusters as similar as possible, but still allow pre-configured customizations in the different clusters.
As an analogy, we want to provide a menu to our developers. They can choose out of a set of dishes, but they can choose the side dish and garnish for each of them. This trade-off makes our customers (developers) happy but avoids the platform team going crazy with different clusters with unique configurations.
This becomes a challenge once you realize how Kustomizations work, which we will tackle in the following question:
Kustomizations are mostly based on layering and creating different variants.
This means that for each unique combination, I need a Kustomization file that aggregates all the different services. You might already be imagining multiple folders with a single kustomization file aggregating that unique combination for a specific platform component.
If you have a base with 3 different features, you will end with 2^3 = 8
potential unique combinations. So, the number of kustomizations you will end will follow the binary combinations of 2^n where n is the number of features of this platform component.
You can find a full (and more clear) example of the previous problem here, written by the kustomize-sigs community.
All this is better expressed in this KEP:
The problem is that modular applications cannot always be expressed in a tall hierarchy while preserving all combinations of available features. Doing so would require putting each feature in an overlay, and making overlays for independent features inherit from each other.
However, this is semantically incorrect, cannot not scale as the number of features grows, and soon results in duplicate manifests and kustomizations.
Instead, such applications are much better expressed as a collection of components, i.e., reusable pieces of configuration logic that are defined in a common place and that distinct overlays can then mix-and-match. This approach abides by the
DRY principle and increases ease of maintenance.
So, do we need to choose between flexibility and a simple structure? Not if we use Kustomize Components.
Kustomize components provide a more flexible way to enable/disable features and configurations for applications directly from the kustomization file. This results in more readable, concise and intuitive overlays.
Keeping the analogy of food menus, Kustomize Components allows us to extend a base (the main dish) with side-dishes and garnish at the will of the final customer.
How are they used?
# my-service/_base/kustomization.yaml
# Declaring the base
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- configmap.yaml
---
# my-service/feature1/kustomization.yaml
# Declaring the Component
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component # We just define a Kustomization as a Component
resources:
- resource1.yaml
- resource2.yaml
patchesStrategicMerge:
- configmap.yaml
---
# my-service-instance/kustomization.yaml
# Usage of the base and the component together
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../my-service/_base
components:
- ../my-service/feature1
- ../my-service/feature2
As you can see, it is as easy as declaring the base and components and then making use of them in the instance where we use them. However, we can do even better!
FluxCD Kustomization resources do support Components as well, so we can do the instantiation using directly the Flux Kustomization resource:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: podinfo
namespace: flux-system
spec:
path: "./my-service/_base"
components:
- ../feature1
- ../feature2
It is important to note that the components’ paths must be local and relative to the path specified by .spec.path
, whereas in the Kustomize example, it is relative to the kustomization.yaml file’s location.
Even if we use Components, we will want to set specific values to the Flux Kustomizations. To do so, we can make use of the Post Build Variable Substitution that Flux Kustomize provides.
With it, we can define a resource in any Kustomization with variables to be replaced:
---
apiVersion: v1
kind: Namespace
metadata:
name: apps
labels:
environment: ${cluster_env:=dev}
region: "${cluster_region}"
---
apiVersion: v1
kind: Secret
metadata:
name: secret
namespace: apps
type: Opaque
stringData:
token: ${token}
And then, replace these values with Flux either from static values or from ConfigMaps and Secrets:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps
spec:
interval: 5m
path: "./apps/"
postBuild:
substitute:
cluster_env: "prod"
cluster_region: "eu-central-1"
substituteFrom:
- kind: ConfigMap
name: cluster-vars
# Use this ConfigMap if it exists, but proceed if it doesn't.
optional: true
- kind: Secret
name: cluster-secret-vars
# Fail if this Secret does not exist.
---
apiVersion: v1
kind: Secret
metadata:
name: cluster-secret-vars
namespace: flux-system
type: Opaque
stringData:
token: SUPERSECRETTOKEN
This would then render into:
---
apiVersion: v1
kind: Namespace
metadata:
name: apps
labels:
environment: "prod
region: "eu-central-1"
---
apiVersion: v1
kind: Secret
metadata:
name: secret
namespace: apps
type: Opaque
stringData:
token: SUPERSECRETTOKEN
This simple feature can enable us to inject cluster variables into the different Flux Kustomizations avoiding patches and thus more human readable.
Note that this feature is provided by Flux, and it is not supported by Kustomize itself. There are many requests to support so, but they have always been declined arguing that there are other tools that do it better, and they want to avoid including this logic in Kustomize as Kustomize aims to be as declarative as possible.
Now, let’s try to bring all this together. This will be our project’s structure:
.
├── clusters
│ ├── _profiles # Store all the different profiles
│ │ ├── _base # Base for all cluster profiles (things installed in all variants)
│ │ ├── home
│ │ └── prod
│ ├── home-cluster-raspi # A cluster instance
│ │ ├── flux-system # Generated by flux bootstrap
│ │ └── platform
│ │ ├── kustomization.yaml # Maps to a profile and injects secrets/config in the cluster
│ │ ├── cluster-secrets.yaml
│ │ └── cluster-config.yaml
│ ├── azure-cluster-aks
│ └── ...
└── platform # Contains all the platform services
├── grafana-operator
│ └── _base
├── grafana-agent
├── cert-manager
├── datadog-operator
├── datadog-agent
├── ingress-nginx
│ ├── _base # Base implementation of this service
│ └── nodeport # Feature to expose nginx in a NodePort instead of in a LoadBalancer
├── local-path-provisioner
└── ...
Let’s implement, for example, the DataDog stack in a cluster. It will require the API and APP key secrets.
First, let’s declare two platform services, one that installs the operator and CRDs:
# platform/datadog-operator/_base/helm.yaml
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmRepository
metadata:
name: datadog
namespace: flux-system
spec:
interval: 4h
url: https://helm.datadoghq.com
---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: datadog-operator
namespace: flux-system
spec:
targetNamespace: ${namespace_name:=default}
serviceAccountName: kustomize-controller
chart:
spec:
chart: datadog-operator
interval: 15m
sourceRef:
kind: HelmRepository
name: datadog
version: '1.2.1'
interval: 15m
values:
apiKeyExistingSecret: datadog-secret
appKeyExistingSecret: datadog-secret
site: datadoghq.eu
---
# platform/datadog-operator/_base/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: datadog-secret
namespace: ${namespace_name:=default}
type: Opaque
stringData:
api-key: ${datadog_api_key}
app-key: ${datadog_app_key}
---
# platform/datadog-operator/_base/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: ${namespace_name:=default}
labels:
owner: ${namespace_owner:=platform}
---
# platform/datadog-operator/_base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- helm.yaml
- secret.yaml
- namespace.yaml
… and another that uses the CRs to declare agents and is installed in the same namespace. This one will have a feature to enable APM tracing in only certain cluster profiles as it will be disabled by default.
# platform/datadog-agent/_base/datadogagent.yaml
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
namespace: ${namespace_name:=default}
spec:
global:
site: datadoghq.eu
credentials:
apiSecret:
secretName: datadog-secret
keyName: api-key
appSecret:
secretName: datadog-secret
keyName: app-key
features:
apm:
enabled: false
clusterChecks:
enabled: true
kubeStateMetricsCore:
enabled: true
logCollection:
containerCollectAll: false
enabled: false
liveContainerCollection:
enabled: false
liveProcessCollection:
enabled: false
---
# platform/datadog-agent/_base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- datadogagent.yaml
---
# platform/datadog-agent/apm/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component
patchesStrategicMerge:
- datadogagent-patch.yaml
---
# platform/datadog-agent/apm/datadogagent-patch.yaml
apiVersion: datadoghq.com/v2alpha1
kind: DatadogAgent
metadata:
name: datadog
namespace: ${namespace_name:=default}
spec:
features:
apm: true
Now, we have two platform services that we need to configure in order to provide the secrets.
So, let’s create the prod profile that will be using this. Note that the Flux Kustomizations that load the datadog service are expecting secrets and configs from two files cluster-secrets and cluster-configs. We will create them later on.
# clusters/_profiles/prod/datadog.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: datadog-operator
namespace: flux-system
spec:
interval: 15m
sourceRef:
kind: GitRepository
name: flux-system
serviceAccountName: kustomize-controller
path: ./platform/datadog-operator/_base
prune: true
wait: true
timeout: 5m
postBuild:
substitute:
namespace_name: datadog
substituteFrom:
- kind: ConfigMap
name: platform-namespace-vars
optional: true # Use this ConfigMap if it exists, but proceed if it doesn't.
- kind: Secret
name: cluster-secrets
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: datadog-agent
namespace: flux-system
spec:
components:
- ../apm # Here we set to use the APM feature for this instance
dependsOn:
- name: datadog-operator
interval: 15m
sourceRef:
kind: GitRepository
name: flux-system
serviceAccountName: kustomize-controller
path: ./platform/datadog-agent/_base
prune: true
postBuild:
substitute:
namespace_name: datadog
substituteFrom:
- kind: ConfigMap
name: platform-namespace-vars
optional: true # Use this ConfigMap if it exists, but proceed if it doesn't.
---
# clusters/_profiles/prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component
resources:
# This is the base profile where we can set services to be installed anywhere, loaded as an overlay
- ../_base
- datadog.yaml
Now, we just need to make use of the prod profile in our example-prod cluster.
FluxCD bootstrapping will create the folder of the cluster and a flux-system folder inside of it, with all the definitions of Flux. We will create some extra configurations to load our platform services.
# clusters/example-prod/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: flux-system
components:
- ../../_profiles/home
resources:
- cluster-secrets.yaml
- cluster-config.yaml
---
# clusters/example-prod/cluster-secrets.yaml
kind: Secret
apiVersion: v1
metadata:
name: cluster-secrets
stringData:
datadog_api_key: SUPERSECRET
datadog_app_key: SUPERSECRET
# clusters/example-prod/cluster-configs.yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: cluster-config
data:
foo2: bar2
This is just an example. Never keep your secrets in a Git repository unencrypted. Make sure to use SOPS or a vault to keep your secrets encrypted!
So, now with all this, setting the right cluster secrets and configs, and just by defining the profile, we can bootstrap a cluster!
Inside the platform folder, you can also do patch overrides of the profile, allowing you to keep all the unique configurations of your cluster in one single folder, close together.
So, depending on how spread you want this configuration to to be, you can set it at different levels:
If we want a new cluster, just bootstrap 3 more files (or more if you want to override something), and you are good to go!
But, can we do it better? Now, we have a standard way to define config, secrets, and profiles, kind of a contract to create a cluster which gives room for even more automatization.
So, why not inject all this when creating the cluster with IaaC, Terraform, or OpenTofu?
We can even create new API and APP keys for each individual cluster.
We can automatize this; so, with a single terraform/tofu apply
The following are just some pieces of all the code, but all the previous logic can be summarized in:
# File: module/flux-cluster/main.tf
# Bootstrap flux
resource "flux_bootstrap_git" "this" {
path = "clusters/${var.cluster_name}"
# Let's depend on the secrets and configs so at first boot flux them in-place
# Otherwise, it takes 10min to reconcile
depends_on = [
github_repository_file.config,
github_repository_file.kustomization,
github_repository_file.secrets,
]
}
# Custom profile management
locals {
flux_platform_path = "clusters/${var.cluster_name}/platform"
}
resource "github_repository_file" "kustomization" {
repository = var.github_repository
branch = "main"
commit_message = "[Flux] Configure Kustomization for ${var.cluster_name}"
overwrite_on_create = false
file = "${local.flux_platform_path}/kustomization.yaml"
content = templatefile(
"${path.module}/templates/kustomization.sample.yaml",
{
profile_name = var.profile_name
}
)
}
resource "github_repository_file" "secrets" {
repository = var.github_repository
branch = "main"
commit_message = "[Flux] Configure cluster secrets for ${var.cluster_name}"
overwrite_on_create = false
file = "${local.flux_platform_path}/cluster-secrets.yaml"
content = templatefile(
"${path.module}/templates/secrets.sample.yaml",
{
# TODO Secrets should be stored encrypted with a provided SOPS key before committing
data = [for key, val in var.cluster_secrets : "${key}: encrypted(${val})"]
}
)
}
resource "github_repository_file" "config" {
repository = var.github_repository
branch = "main"
commit_message = "[Flux] Configure cluster config for ${var.cluster_name}"
overwrite_on_create = false
file = "${local.flux_platform_path}/cluster-config.yaml"
content = templatefile(
"${path.module}/templates/config.sample.yaml",
{
data = [ for key, val in var.cluster_config : "${key}: ${val}"]
}
)
}
# File main.tf
locals {
## PROVIDER CONFIG
# GitHub App credentials
# We are using a GitHub App to authenticate against GitHub as a PAT is an anti-pattern
github_app_id = "403427"
github_app_installation_id = "42574484"
pem_file = file("${path.module}/private-key.pem")
# Flux repo configuration
github_org = "Sturgelose"
github_repository = "flux-platform"
## CLUSTER CONFIG
cluster_name = "my-cluster"
}
resource "kind_cluster" "this" {
name = local.cluster_name
}
# Deploy Key Generation (used by the Flux provider)
resource "tls_private_key" "flux" {
algorithm = "ECDSA"
ecdsa_curve = "P256"
}
# Make use of the previous module
module "flux" {
source = "../modules/flux-cluster"
cluster_name = kind_cluster.this.name
public_key_openssh = tls_private_key.flux.public_key_openssh
profile_name = "home"
cluster_secrets = {
datadog_api_key = "bar"
datadog_api_key = "foo!"
}
cluster_config = {
foo2 = "bar2"
}
}
Simply do a terraform/tofu apply
, and everything will be created!
kind_cluster.this: Creating...
tls_private_key.flux: Creating...
tls_private_key.flux: Creation complete after 0s [id=bd2ddfaa118a8f8419edbde196c2c17349d161e5]
kind_cluster.this: Still creating... [10s elapsed]
kind_cluster.this: Creation complete after 16s [id=my-cluster-]
module.flux.github_repository_deploy_key.this: Creating...
module.flux.github_repository_file.config: Creating...
module.flux.github_repository_file.secrets: Creating...
module.flux.github_repository_file.kustomization: Creating...
module.flux.github_repository_deploy_key.this: Creation complete after 1s [id=flux-platform:90111711]
module.flux.github_repository_file.kustomization: Creation complete after 9s [id=flux-platform/clusters/my-cluster/platform/kustomization.yaml]
module.flux.github_repository_file.config: Creation complete after 9s [id=flux-platform/clusters/my-cluster/platform/cluster-config.yaml]
module.flux.github_repository_file.secrets: Still creating... [10s elapsed]
module.flux.github_repository_file.secrets: Creation complete after 10s [id=flux-platform/clusters/my-cluster/platform/cluster-secrets.yaml]
module.flux.flux_bootstrap_git.this: Creating...
module.flux.flux_bootstrap_git.this: Still creating... [10s elapsed]
module.flux.flux_bootstrap_git.this: Still creating... [20s elapsed]
module.flux.flux_bootstrap_git.this: Still creating... [30s elapsed]
module.flux.flux_bootstrap_git.this: Still creating... [40s elapsed]
module.flux.flux_bootstrap_git.this: Creation complete after 41s [id=flux-system]
You can find all the work (and more) of this post, in this repo:
We have learned how to simplify our structure while keeping it simple with Kustomize Components and Templating.
This current setup is flexible enough to be extended into many production environments while keeping an opinionated and DRY structure. For example, it could be added to the logic to create groups of clusters sharing the same base and still be potentially created through IaaC.
Or, in the current example, we are passing the secrets when they could be automatically created using the datadog provider.
This structure is just a base that can be as simple or complex as each platform team will need, but that clarifies and fixes lots of common issues that you will find with the default structure and logic that FluxCD docs suggest.
We haven’t touched yet all the different issues as there are some yet unsolved, especially on the multi-tenancy side, but I will try to tackle them in future posts.