This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.
Flux puts the Git into GitOps
Ever since the rewrite of Flux as a set of focused controllers, it has
become clearer what each of its functions and capabilities are. The
aptly named controllers carry in their name what they are responsible
for and which data or tooling they interact with, so that is, e.g.
If you wanted to string a proof-of-concept for a GitOps tool together, a
naïve solution could be to just shell out to various tools like
helm. While that might feel intuitive at first (since
it so closely resembles one’s manual workflow behind the keyboard) and
might be quick and fundamentally functional, it comes at a
big cost in the subsequent refinement stages: adequately catching
errors, providing detailed status information, security considerations,
mismatches between command line tools and infrastructure implementation,
API and CLI versions, etc.
In the course of the last five years since the start of the Flux project, we have seen all of the above and more. Because other projects made those mistakes, or because we did.
Let’s drill a bit deeper into why we put so much effort into integrating as tightly with given tooling APIs and SDKs as possible.
Why we are not using the Git CLI
Without Git there is no GitOps, so we obviously want to support all Git providers, all the edge cases, all the different ways things can be set up and all the Git operations we need. Obvious interactions with Git are when we perform clone and push operations on remote Git repositories, for example.
Using a CLI for any code-path should be a last resort - if at all. It is a design principle for the Flux controllers not to do this. We avoid an entire class of vulnerabilities: command injection.
Another big reason back when we started working on
for dropping the Git CLI was multi-tenancy. The Git CLI wants the SSH and
PGP keys on disk while we wanted them to be loaded from memory to isolate
the tenants secrets without having to write them on disk and risk being
vulnerable to directory traversal attacks.
All in all we did choose not having to rely on the Git binary being present, instead we link statically against a known-good and sufficiently well-tested version. More on that below.
Why do we have support for multiple Git implementations
We started out using
all Git operations, as it is an implementation of the Git protocol
written entirely in Go. When we wanted to support Azure DevOps and saw
that support for
multi_ack_detailed wasn’t included
go-git, we started making use of
git2go in addition. It is the Go
the libgit2 library and has greater
support for more complex capabilities in the git wire protocol, including
git protocol version 2.
git2go does not support shallow clones or Git submodules.
The newly added support for commit signing with SSH keys is also not
supported by our implementations at present.
While the above might sound like petty implementation details, we had to
learn ourselves that each Git implementation has its own set of
shortcomings. Things which “just work” in the Git CLI, any of the
implementations get subtly wrong, as they work on the
of Git. In the end we chose to make
Just to illustrate what happens when you try to do things just right, here’s a couple of pieces of work we needed to get done along the way:
- we had to add support for e.g. verifying
known_hostsfiles for SSH connections
- when connecting over SSH, we received SHA1 and MD5 fingerprints of
the returned public key from the server and not the key itself,
known_hostsverification a little harder
- changes on
libgit2would break the way that known hosts used to work
- Making various kinds of SSH key types work, e.g. support for
ECDSA*added as part of
libgit2 >= 1.2.0,
ED25519as part of
libgit2 >= 1.3.0.
- we needed to start verifying PGP signatures
Tracking upstream developments in Git
As Git has become ubiquitous and virtually all of the world’s software development depends on Git, it still is in active development. Constant improvements affect features, efficiency, configurability and better security. Of course we want to pass this all down to our users: more efficient download makes a huge difference, support for Git Submodules enables new use-cases, support for more GPG verification or new SSH key formats adds additional security and when new features are rolled out in Git providers, we need to support them in Flux too.
Integrating these changes into Flux unfortunately isn’t as easy as it sounds. One part of the dependency chain for git2go goes:
libssh2 (to enable the SSH transport) and
OpenSSL. As Linux vendors often take a very
conservative approach to bringing new software releases to stable
releases, we were unfortunately pushed to
building these dependencies ourselves.
In addition to that, these libraries have quite a few configuration
options that can only be set at build time and unfortunately the
libssh2 packages of different Linux distributions act in
This created a yet different problem: the versions we shipped on the
containers could behave in different ways when we were developing on our
Mac/Linux machines. This forced us to cross-compile statically built
libraries that we can simply download at both development time or
statically linking them into the final binary we create when releasing
We decided to build the libraries for the AMD64, ARM64 and ARMv7 architectures and link them statically, which was a prerequisite for us to enable fuzzing for all Flux controllers. Getting this all up and running for every upstream release and making sure it’s all covered nicely with tests is a challenge and an area of work we invested quite a lot of time in.
What’s next in Git things?
libgit2 does not expose concepts to allow users to set timeouts for
network operations, meaning that most git operations could hang
indefinitely in specific circumstances. This would result in specific
GitRepository objects getting stuck and stop updating until the
controllers get restarted - Users reported this for both the
image-automation and source controllers in the past 6 months.
We had a few challenges getting traction making changes upstream to fix similar issues, so to avoid delaying a fix or forking the dependency, we decided to add experimental support for Go managed transports, which means we can enforce that network operations won’t take more than a given amount of time to complete, but without requiring any changes upstream.
part of Flux 0.28
and can be enabled by adding an environment
This will give us more control over the transport with Go native
transport using the
libgit2 smart transport support. Read the
for more information.
If you want to enable this automatically, just add the following to
patches: - patch: | - op: add path: /spec/template/spec/containers/0/env/0 value: name: EXPERIMENTAL_GIT_TRANSPORT value: "true" target: kind: Deployment name: "(source-controller|image-automation-controller)"
Support for sha256 hashes: Git CLI supports it since 2.29, however all major Git service providers, such as GitLab and GitHub, are yet to make some progress on this front. Upstream, libgit2 is starting to pave the way for that support on v1.4.0, we will continue to watch this space so we can support Flux users as the industry moves on from SHA1.
After this strong focus on stability in the last months, we are now going to take a look at how we can optimise our git implementation to reduce resource consumption and network traffic across git reconciliations.
Flux doesn’t shell out to binaries like
because we deem it too error-prone and we would miss big opportunities
to bring you the best developer experience and the most accurate information
on every step along the way. This way we offer more control to you. As you
might have gathered from the amount of additional work this all means for
us, we take the “Git” in “GitOps” very seriously.
Talk to us
We love feedback, questions and ideas, so please let us know your personal use-cases today. Ask us if you have any questions and please
- join our upcoming dev meetings
- find us in the #flux channel on CNCF Slack
- add yourself as an adopter if you haven’t already
See you around!