RFC: Continuous OS Deployment

Summary

Operating System Continuous Deployment (OSCD) refers to the process of seamlessly upgrading my NixOS machines to the latest and greatest release with as little overhead and manual fiddling as possible. This RFC proposes an OSCD overhaul that achieves a greater level of automation and reproducibility in the OS upgrade process via a few added GitHub CD hooks, the new CLI tool anix-upgrade, and some development process changes.

Motivation

NixOS upgrades on my machines are done using the nixos-rebuild command, which builds the system configuration according to the (symlinked) file /etc/nixos/configuration.nix, which points to a mutable root configuration file in ~/sources/anixpkgs/.

This model lends itself best to rapid prototyping and test, as changes to the configuration can be immediately tested without having to commit any code. However, release deployment strategies are left to be more ad hoc and manual (and thus error-prone) under this model, as well. For example, to tag and deploy a new release off of master, the following manual steps must be taken:

  1. The desired release commit is tagged, either locally or via the GitHub releases page.
  2. The dependencies.nix file for all NixOS configurations is modified to refer to the new tag, manifesting as a new commit on the head of master.
  3. ~/sources/anixpkgs/ is checked out to the commit from step 2 (not step 1, ironically). Any local dev changes need to be stashed.
  4. A nixos-rebuild command is run.

This RFC calls for a mature, stable OSCD pipeline that removes the need for the manual steps listed above while maintaining flexibility for rapid (and decoupled) on-machine development and testing.

Driving requirements

System-level requirements

  • [R1] OSCD shall be totally decoupled from development work; there shall be no possibility of one accidentally polluting the other.
  • [R2] OSCD shall be atomic such that an upgrade cannot be corrupted via a mismanagement or erroneous execution of steps.
  • [R3] OS release tagging shall be wholly executable within an anixpkgs pull request, and shall be entirely automated except for a manual specification of the level of release (e.g., major, minor, patch) that the pull request corresponds to.
  • [R4] An OS upgrade on-machine shall take no more than one step to complete.
  • [R5] The same upgrade mechanism from [SR4] shall enable rapid prototyping of active development branches.

Software-level requirements

  • [R1.1] Development within anixpkgs shall happen in a separate location from the symlinked ~/sources/anixpkgs directory, which shall be reserved for OS upgrades.
  • [R1.2] ~/sources/anixpkgs shall be read-only.
  • [R2.1] The release tagging process shall execute all required steps automatically, prompted by a single initiatory step, within the remote CD pipeline.
  • [R2.2] All automated steps for [R2.1] shall kick off only after a pull request merge into master, which is push-protected.
  • [R2.3] The OS upgrade process shall consist of an atomic source preparation step followed by an atomic rebuild step, both chained together automatically.
  • [R3.1] The tagging process from [R2.1] shall be initiated via adding labels to a pull request. No further action should be required.
  • [R4.1] OS upgrades shall be offered by a CLI tool that, with no arguments specified, will upgrade the system to the most recent anixpkgs release off of master.
  • [R4.2] The OS upgrade CLI tool shall perform rebuild switch by default, but allow for rebuild boot to be manually specified instead.
  • [R5.1] The OS upgrade CLI tool shall allow (via provided arguments) for an upgrade to any particular tag, branch, or commit within the remote anixpkgs repository.
  • [R5.2] The OS upgrade CLI tool shall allow for builds with packages local to the specified build target, and not necessarily tied to that target's prescribed release version.

Detailed design

The OSCD requirements are addressed with three components: an updated deployment pipeline within anixpkgs GitHub actions, an OS upgrade CLI tool, and a new policy for anixpkgs development and testing.

Deployment pipeline

Requirements [R2.1-2,3.1] are proposed to be fulfilled by adding three GitHub actions jobs with write permissions on master to the deployment workflow. Each of these three jobs will be responsible for either tagging a new major release, minor release, or patch release, and will be triggered by a corresponding pull request label on a merged pull request only.

Each of these jobs will consist of the following steps:

  1. Only execute if a merged pull request had the appropriate release label.
  2. Checkout anixpkgs master.
  3. Run a version increment script that increments the release version according to the release label.
  4. Commit the semantic version changes and tag that commit with the new semantic version string.
  5. Push the commit and corresponding tag to master.

The version increment script will simply take as an input the release increment type (major, minor, or patch) and increment accordingly:

  • (Major) x.y.z -> x+1.0.0
  • (Minor) x.y.z -> x.y+1.0
  • (Patch) x.y.z -> x.y.z+1

In the event of a pull request getting merged with multiple labels, the execution order of the release tag jobs will be major -> minor -> patch, such that each successive tag will be visible in the resulting semantic version. If the order were reversed, for example, a patch release increment would be obscured by a minor release increment, which would zero out the patch field.

OS upgrade CLI tool: anix-upgrade

One a new release has been properly and automatically tagged on anixpkgs remote, a CLI tool called anix-upgrade is proposed to upgrade the system to the latest tag in an automated and incorruptible (i.e., no need nor opportunity to manually modify code or any configurations during the process) fashion and fulfill [R1.2,2.3,4.1-2,5.1-2].

As implied by [R2.3], anix-upgrade will do two things:

  1. Prepare a read-only (and Git-less) version of anixpkgs in the ~/sources directory according to the exact version specified via arguments through the CLI.
  2. Rebuild the system configuration according to the updated source/configuration.

(1) may be naturally accomplished via a Nix derivation, which by definition must consist of a read-only (and reproducible) output with no tolerance for side effects from things like .git directories. The CLI tool will allow for (1) to be constructed from:

  • No argument at all, which will assume that the target source corresponds to the current head of anixpkgs master. This will be the way to perform standard, non-dev-prototyping upgrades.
  • An alternative release tag string or a development branch name (assumed to be at the head of that branch) or a commit hash.
  • An optional specification that the OS packages should be built from the local packages in that specific version of the anixpkgs source tree, and not necessarily from the prescribed release version of the packages.

Because Nix derivations cannot clone Git repositories, the Nix built-in tools fetchGit and fetchTarball must be used to fetch the exact right version of the source. In my experience, fetchGit does not consistently fetch the expected "head-of" commit when only a branch name is specified, so in this implementation fetchTarball is to be used for any version of (1) that does not specify the full commit hash. This is possible because of GitHub's feature where it allows for URL-based fetching of zipped source archives from tag names and branch names (reliably delivering the head-of commit in the latter case).

(2) is accomplished by aliasing to the nixos-rebuild command (exposing an optional flag to require a reboot for changes to take effect as per [R4.2]) and only executing once (1) has successfully completed.

Development policy changes

All-in-all, the above changes serve to reduce the cognitive load associated with releasing and deploying a new version of anixpkgs once development is complete. In the same vein, the above changes are best served by fulfilling [R1.1] as well, which moves anixpkgs development out of the ~/sources directory and into the ~/dev directory, consistent with development on individual repositories. While mostly a clerical change, there are a couple of implications:

  • The machines docs should be updated to modify instructions for setting up a new machine from scratch. The setup process may now be slightly more complicated on a one-time basis.
  • The ~/.devrc file on every development machine (see devshell) should be modified to set the pkgs_dir variable to the ~/dev/ location and not the ~/sources/ one. This is to preserve the ability to mutate attribute-based sources with the same level of flexibility as before.

Drawbacks

The principal drawback of this design is the new requirement to commit OS prototype changes to GitHub when prototyping new OS versions via the specification of a remote tag, branch, or commit with the OS upgrade tool.

Alternatives

To address the slight OS prototyping drawback, an additional option may be added to anix-upgrade to point to a local anixpkgs source tree, which would symlink to the mutable source tree rather than an immutable nix derivation that pulls from GitHub.

Unresolved questions

  • Must OS upgrades always be manually triggered via anix-upgrade, or might it be fruitful to execute automatic upgrades in the future?
  • How may OSCD fruitfully apply to machine closures that import anixpkgs closures but specify their own configurations?