Case study

What I learned building a local-first DevOps command center.

The hard part was not installing tools. The hard part was making the system readable.

A homelab can turn into a junk drawer fast. One service lives in a repo. Another has a compose file somewhere else. A useful command is buried in shell history. The reason a thing exists is in your head, which works until you are tired, busy, or debugging under pressure.

I started treating my infrastructure less like a collection of services and more like an operations system. That changed what I cared about first.

Docs before dashboards

Dashboards are useful, but they do not explain intent. If I do not know what a service does, who owns it, where it runs, how it fails, and what the first safe checks are, a graph only tells me that something is wrong faster.

So the first layer is documentation: service inventory, purpose, dependencies, access notes, backup notes, and basic troubleshooting steps. Nothing fancy. Just enough that future me is not starting from zero.

Private source of truth, public proof

Some infrastructure work should stay private. Internal hostnames, IPs, credentials, screenshots, and operational notes do not belong on a public portfolio.

But the pattern can be public. The way I structure docs, think about monitoring, split public and private repos, and build runbooks can become proof of work without leaking the system itself.

Monitoring needs a question

It is tempting to add monitoring tools because they feel professional. That is backwards. The useful question is: what problem should this tool help me answer?

Did the service crash?
Did the deploy fail?
Did users hit an exception?
Is the database reachable?
What changed before the incident?

Sentry, GlitchTip, logs, health checks, and metrics all answer different questions. Picking the tool before defining the question usually creates more noise.

What I would do for a small team

I would not start by selling them a giant platform. I would start with a short operational baseline:

Map the services.
Write the first runbooks.
Check how repos, secrets, and deployments are organized.
Define the first health checks.
Choose monitoring based on actual failure modes.

That is not glamorous. It is useful. And useful is what I want this portfolio to prove.