This article is more than one year old. Older articles may contain outdated content. Check that the information in the page has not become incorrect since its publication.

Flux puts the Git into GitOps

Flux integrates very tightly with all relevant APIs and SDKs. For us to provide the best possible Git support to bring you GitOps, shelling out to Git is not an option. Find out why in this blog post.

By dholbach | 2022-03-25

Ever since the rewrite of Flux as a set of focused controllers, it has become clearer what each of its functions and capabilities are. The aptly named controllers carry in their name what they are responsible for and which data or tooling they interact with, so that is, e.g. source, kustomize, image-automation, notification, helm, etc.

Overview Flux controllers

If you wanted to string a proof-of-concept for a GitOps tool together, a naïve solution could be to just shell out to various tools like curl, git, kubectl and helm. While that might feel intuitive at first (since it so closely resembles one’s manual workflow behind the keyboard) and might be quick and fundamentally functional, it comes at a big cost in the subsequent refinement stages: adequately catching errors, providing detailed status information, security considerations, mismatches between command line tools and infrastructure implementation, API and CLI versions, etc.

In the course of the last five years since the start of the Flux project, we have seen all of the above and more. Because other projects made those mistakes, or because we did.

Let’s drill a bit deeper into why we put so much effort into integrating as tightly with given tooling APIs and SDKs as possible.

Why we are not using the Git CLI

Git logo

Without Git there is no GitOps, so we obviously want to support all Git providers, all the edge cases, all the different ways things can be set up and all the Git operations we need. Obvious interactions with Git are when we perform clone and push operations on remote Git repositories, for example.

Using a CLI for any code-path should be a last resort - if at all. It is a design principle for the Flux controllers not to do this. We avoid an entire class of vulnerabilities: command injection.

Another big reason back when we started working on source-controller for dropping the Git CLI was multi-tenancy. The Git CLI wants the SSH and PGP keys on disk while we wanted them to be loaded from memory to isolate the tenants secrets without having to write them on disk and risk being vulnerable to directory traversal attacks.

All in all we did choose not having to rely on the Git binary being present, instead we link statically against a known-good and sufficiently well-tested version. More on that below.

Why do we have support for multiple Git implementations

We started out using go-git for all Git operations, as it is an implementation of the Git protocol written entirely in Go. When we wanted to support Azure DevOps and saw that support for multi_ack and multi_ack_detailed wasn’t included in go-git, we started making use of git2go in addition. It is the Go bindings of the libgit2 library and has greater support for more complex capabilities in the git wire protocol, including git protocol version 2. Unfortunately git2go does not support shallow clones or Git submodules. The newly added support for commit signing with SSH keys is also not supported by our implementations at present.

While the above might sound like petty implementation details, we had to learn ourselves that each Git implementation has its own set of shortcomings. Things which “just work” in the Git CLI, any of the implementations get subtly wrong, as they work on the “plumbing” level of Git. In the end we chose to make gitImplementation a configurable setting.

Just to illustrate what happens when you try to do things just right, here’s a couple of pieces of work we needed to get done along the way:

we had to add support for e.g. verifying known_hosts files for SSH connections
when connecting over SSH, we received SHA1 and MD5 fingerprints of the returned public key from the server and not the key itself, which made known_hosts verification a little harder
changes on libgit2 would break the way that known hosts used to work
Making various kinds of SSH key types work, e.g. support for ECDSA* added as part of libgit2 >= 1.2.0, ED25519 as part of libgit2 >= 1.3.0.
we needed to start verifying PGP signatures

Tracking upstream developments in Git

As Git has become ubiquitous and virtually all of the world’s software development depends on Git, it still is in active development. Constant improvements affect features, efficiency, configurability and better security. Of course we want to pass this all down to our users: more efficient download makes a huge difference, support for Git Submodules enables new use-cases, support for more GPG verification or new SSH key formats adds additional security and when new features are rolled out in Git providers, we need to support them in Flux too.

Integrating these changes into Flux unfortunately isn’t as easy as it sounds. One part of the dependency chain for git2go goes:

flowchart LR libgit2 --> libssh2 --> OpenSSL

So that’s libgit2, libssh2 (to enable the SSH transport) and OpenSSL. As Linux vendors often take a very conservative approach to bringing new software releases to stable releases, we were unfortunately pushed to building these dependencies ourselves. In addition to that, these libraries have quite a few configuration options that can only be set at build time and unfortunately the openssl/libssh2 packages of different Linux distributions act in slightly different ways. This created a yet different problem: the versions we shipped on the containers could behave in different ways when we were developing on our Mac/Linux machines. This forced us to cross-compile statically built libraries that we can simply download at both development time or statically linking them into the final binary we create when releasing our controllers.

We decided to build the libraries for the AMD64, ARM64 and ARMv7 architectures and link them statically, which was a prerequisite for us to enable fuzzing for all Flux controllers. Getting this all up and running for every upstream release and making sure it’s all covered nicely with tests is a challenge and an area of work we invested quite a lot of time in.

What’s next in Git things?

libgit2 does not expose concepts to allow users to set timeouts for network operations, meaning that most git operations could hang indefinitely in specific circumstances. This would result in specific GitRepository objects getting stuck and stop updating until the controllers get restarted - Users reported this for both the image-automation and source controllers in the past 6 months.

We had a few challenges getting traction making changes upstream to fix similar issues, so to avoid delaying a fix or forking the dependency, we decided to add experimental support for Go managed transports, which means we can enforce that network operations won’t take more than a given amount of time to complete, but without requiring any changes upstream.

This is part of Flux 0.28 and can be enabled by adding an environment EXPERIMENTAL_GIT_TRANSPORT=true in both source and image-automation controllers.

This will give us more control over the transport with Go native transport using the libgit2 smart transport support. Read the source-controller changelog for more information.

If you want to enable this automatically, just add the following to your kustomization.yaml:

patches:
- patch: |
    - op: add
      path: /spec/template/spec/containers/0/env/0
      value:
        name: EXPERIMENTAL_GIT_TRANSPORT
        value: "true"    
  target:
    kind: Deployment
    name: "(source-controller|image-automation-controller)"

Support for sha256 hashes: Git CLI supports it since 2.29, however all major Git service providers, such as GitLab and GitHub, are yet to make some progress on this front. Upstream, libgit2 is starting to pave the way for that support on v1.4.0, we will continue to watch this space so we can support Flux users as the industry moves on from SHA1.

After this strong focus on stability in the last months, we are now going to take a look at how we can optimise our git implementation to reduce resource consumption and network traffic across git reconciliations.

Conclusion

Flux doesn’t shell out to binaries like git, helm or kubectl, because we deem it too error-prone and we would miss big opportunities to bring you the best developer experience and the most accurate information on every step along the way. This way we offer more control to you. As you might have gathered from the amount of additional work this all means for us, we take the “Git” in “GitOps” very seriously.

Talk to us

We love feedback, questions and ideas, so please let us know your personal use-cases today. Ask us if you have any questions and please

join our upcoming dev meetings
find us in the #flux channel on CNCF Slack
add yourself as an adopter if you haven’t already

See you around!