The problem

If, like me, you've managed Linux servers and workstations for a while, you'll know that: things break.
Things break, given enough time (and activity), and so, we fix them.
Over and over again we fix the little things that break in our systems over time, be it dependencies that need to be pinned to specific versions or updated, a piece of hardware was changed, small alterations in obscure config files, a chmod here, a chgrp there and life goes on.

"We had a problem, now it's fixed, let's move on"

But these small changes stack up. They come together over time, waiting in the shadows for the fateful day your server's hard-drives fail, you drop your laptop, or a migration becomes necessary for any number of reasons*.
And once that moment comes you are left in a bit of a situation.

*In my case often boredom and the need to re-create an already working stack in my homelab ;)
No worries, my database and documents back up to 3 cross-region replicated buckets on different cloud providers on a bi-nanosecondly basis

Yes but what about the other state?

The other state?

The state we discussed in the beginning of this post, the small edit and changes you accrued over weeks, months, years.
Did you update your Ansible playbooks every time you made a change to your system, regardless of how small the change was?

Actually I just document my setup in Obsidian/Notion/etc...

Well good news first, that is more thorough than a fair amount of production systems I've seen in the past, bad news second, you will now get to painfully find out step by step what the important changes were and it will take you a fair bit longer than you probably expected in order to be back up and running again.

Now what if I told you there is a solution for this?

The solution

I was first introduced to NixOS in my first semester of uni, a friend that I shared a lot of technical opinions with showed and recommended it to me.

Imagine an entire system where every setting, every dial, every knob as well as every piece of installed software and its configuration is derived from one reproducibly buildable configuration

The pitch sounded so good I wiped my laptop and installed NixOS the same day.
And after having used it for a few years now, I can say: I've had some struggles with NixOS now and then and I have looked back at systems like Arch that let you do things "the easy way" whenever you want but I don't think I will ever be able to switch back to a non declarative Linux distribution again.

But what exactly is NixOS?

Good question, NixOS is a Linux distribution based on the Nix package manager and programming language; It allows its users to configure their entire system using a so called nix configuration which consists of one or more .nix files that together make up the config. For example setting up a postgres database for your brand new app idea can look like this:

{
  services.postgresql = {
    enable = true; # Enable the Postgres server
    ensureDatabases = [ "myapp" ]; # Create a DB called myapp
    ensureUsers = [ # Users to create
      {
        name = "myapp";
        ensureDBOwnership = true; # Ensure user has access to myapp db
      }
    ];
  };
}

A simple Postgres setup that ensures the existence of a DB named "myapp" and a matching user with access to said DB

The wonderful thing about NixOS is that changing the enable true to a false in the above code and deploying the code to a server will disable Postgres.

This might seem like an obvious thing to the uninitiated but anyone who has worked with Ansible in the past knows that this is quite the nifty feature to have on hand.

Now let me show you why I love NixOS so much by going through the scenario from the beginning again, but this time we'll go through the ordeal with how I've experienced it the last few times that I reset one of my systems:

➡️
The case I'm referencing was with a server on Hetzner but the process is in my experience identical for dedicated servers (both at home and in the cloud) and user devices like laptops and PCs
  1. The system fails irrecoverably (I wiped the server because I felt like testing my setup)
  2. I auto-provision a new blank system (Debian is my favorite "blank-slate" for this stuff)
  3. I enter the relevant git repo, run nix run github:nix-community/nixos-anywhere, and wait for it to finish its work (everything from formatting the disks to installing and configuring the system)
  4. I ssh into the system that is now fully functional and once the database backups are loaded back in, everything is back up and running.

This is so fundamentally quick and easy that I have in the past:

  • Formatted and reset my PC and laptop just for the fun of it (both fully working and set up just to my liking after 20 min with no human interaction)
  • Created a copy of a server I have at home in the cloud because of a power outage at home while I was on vacation

This ease of reproducibility is immensely important to me since it inevitably saves me a lot of time, pain and repetition.
It also makes managing a larger amount of systems trivially easy as you can apply a lot of the de-duplication and code sharing principles you would normally see in regular code bases.

To my fellow SysAdmins I leave this note:

Being able to search through the exact code-ified state of a normally opaque Linux server and knowing that whatever your grep returns is exactly what's running on the system right now is a game changer.
It makes ssh-ing into it and looking for running processes and open ports feel like staring at shadows on a cave wall

Want to see more of what NixOS can do? I'll be posting deeper dives into specific setups and workflows, so subscribe to get notified when something new drops!

Sign up

Author's notes

Looking back I don't regret jumping in head first with my main machine as it forced me to get acquainted with Nix and NixOS in a "sink or swim" style. Though you can probably save yourself a lot of headaches by playing around with the Nix package manager for a bit before jumping to NixOS