K8s-mtp: a multi-tenant Kubernetes platform pt. 1

The Problem#

While talking with a friend that works in IT, he repeated something that I’ve heard before many times: “It’s really difficult to guarantee consistency of developer experience.” It’s not that we don’t have the tools, but they’re scattered, and the workflow for deploying something are not integrated. Besides, it can extremely time consuming, and prone to mistakes.

But what if there could be a “paved road” approach, where we enforce strict configurations and policies, and establish guardrails, therefore guaranteeing that we can guarantee security, isolation, limits, and the ability to audit the system any time we need to? This is what I’m trying to build with k8s-mtp: a multi tenant platform based on Kubernetes that abstracts away things like RBAC, NetworkPolicy, ResourceQuotas, and a bunch of other important but difficult things, to guarantee that the industry best-practices are always in place.

The objective is to simplify and automate a bunch of processes that were all manual and error-prone before. Does a dev need an environment? It shouldn’t take more than 30s to do it. And it should come by default with proper isolation and security, as well as pre-defined resource limits. This would also allow to do cost-tracking, as well as giving you a proper audit trail, with the built-in logs and metrics coming directly from this platform.

In effect, I’m trying to build a self-hostable Namespace-as-a-service offering. No cloud vendor lock-in, easily deployable with a single Golang binary, with as little external dependencies as possible, and with high possibilities to adapt the product to the specific needs.

The Philosophy#

If you’ve been following the different projects I’ve created, there’s a common thread to them all: they all promote self-hostable set ups with minimal external dependencies. For all the problems and defects that Golang might bring to the table, it really does allow for making single binary deployments a reality; and by leveraging the power of its standard library, with just a few select external dependencies. In a time where supply chain security is such a big problem, this matters a lot! And it also makes for easier deployment, maintenance and improved security.

Besides Golang, what else are we bringing into the fold? For one, we’re using K3s instead of using a full K8s distribution. It’s not the first time I do this, but this time around I’ve learned my lesson - i.e., I’m not trying to reinvent the wheel here, and will mostly stay inside the lane that K3s puts me into, instead of trying and struggling of fitting a round K3s into a square hole.

Other than that, PostgreSQL is going to be the database of choice. There’s nothing to justify here, since this fits the spirit of the project to a T, and by hosting it in a container, there’s a promotion of separation of concerns as well as making backups/snapshots so much easier. Last, but not least, the last piece of the puzzle is Terraform, and it’s probably the least important one here. But after not having messed with it for a while, I thought this could be a good choice to simplify the deployment of the infrastructure that’s needed.

The Implementation#

At this point, I’m ready to start implementing this project. Which I did already, having built the basis of it, and published the code in my personal git forge. At this point in time, there’s some things of note already:

We have a zero-framework http server, by leveraging middleware composition patterns:

func (s *Server) NewMux() *http.ServeMux {
	mux := http.NewServeMux()
	mux.Handle("/health", s.loggingMiddleware(s.healthHandler()))

	return mux
}

We also have structured logging with slog, obviating the need for external dependencies for this:

func (s *Server) loggingMiddleware(next http.HandlerFunc) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		s.logger.Info("request",
			"method", r.Method,
			"path", r.URL.Path,
		)
		next(w, r)
	}
}

Non-root deployment and proper secrets management in Terraform
Network security is a first class citizen here, with e.g. PostgreSQL living within the cluster with no exposed NodePort

What’s next?#

In the next part of this series, I’ll be working on fleshing out the K8s operator for tenant CRDs, as well as the reconciler pattern for controller-runtime and some other goodies. Stay tuned!