kubernetes operator custom resource creation

Yesterday I had a terrible doubt about kubernetes operator deployments. Briefly, I wanted to be sure when it is 100% safe for a custom resource to be created and what possible race-conditions should be taken into account between operator controller instances and its corresponding ‘watched’ custom resources.

I dag on google through post and specs but this chatGTP pretty much summarises my conclusions. I did it following the same questions I did to myself when googling and then used the chat to validate them (and play a bit with the bot), hope it is helpful for you too:

me>> what happens if a kubernetes custom resource is created before the corresponding operator controller is running?

If a Kubernetes custom resource is created before the corresponding operator controller is running, the controller will not be able to process the resource until it is started. Depending on the implementation of the operator and the custom resource, the resource may not be fully functional or may not be monitored for updates until the controller is running. It is recommended to start the operator controller before creating any custom resources to ensure proper functionality.

me>> why is it recommended to start the operator controller before creating any custom resources ?

It is recommended to start the operator controller before creating any custom resources because the controller is responsible for managing the lifecycle of the custom resources. This includes creating, updating, and deleting resources based on the desired state specified in the custom resource.

If the controller is not running when a custom resource is created, the controller will not be able to process the resource and ensure that it is in the desired state. Additionally, the controller may not be able to monitor the resource for updates, which can lead to the resource becoming out of sync with the desired state.

By starting the operator controller before creating any custom resources, you can ensure that the controller is able to immediately process and manage the resources as they are created, which helps to maintain a consistent and desired state of resources, which is the main objective of using kubernetes operator.

me>> does kubernetes operator controller use webhooks?

Yes, Kubernetes operator controllers can use webhooks as a way to perform certain actions on custom resources.

Webhooks are a way to trigger an external service to perform an action when certain events occur in the Kubernetes cluster. For example, a webhook could be configured to validate a custom resource before it is created, or to mutate a custom resource before it is updated.

Operators can use webhooks to perform specific actions on custom resources, such as validating the resource’s configuration, mutating the resource’s specification, or performing a pre- or post-hook.

Webhooks are typically implemented using a Kubernetes admission controller and a separate service that handles the webhook requests. The admission controller intercepts the request to create or update a resource, and sends it to the webhook service, which performs the necessary action before allowing the request to continue.

me>> which alternative to webhooks can be used in kubernetes operator controller?

An alternative to webhooks that can be used in a Kubernetes operator controller is to use a Kubernetes controller pattern.

A controller pattern is a way to implement a specific behavior in a Kubernetes operator. It is a specific type of controller that runs in the same process as the operator, and watches for changes to custom resources. It then takes the appropriate actions to bring the custom resource to the desired state.

Controller patterns are typically implemented using the Go client library for Kubernetes (client-go) and the controller-runtime library. This allows the operator to watch for changes to custom resources and take appropriate actions using the Kubernetes API.

Controller pattern is an alternative to webhooks, as it allows the operator to perform actions on custom resources directly, rather than relying on an external service to handle the webhook request. This can be more efficient and easier to implement, but also might require more knowledge and expertise of kubernetes API.

me>> if an operator controller uses controller pattern, can it process existing resources already present when operator starts running?

If an operator controller uses the controller pattern, it can process existing resources that are already present in the cluster when the operator starts running.

When the operator starts, it will typically register a watch on the custom resources it manages. The watch will trigger a callback function for each existing resource in the cluster, allowing the operator to process these resources.

Depending on the implementation of the operator, it may take different actions when processing existing resources. The operator could, for example, check the current state of the resource and ensure it matches the desired state, or it could update the resource to the desired state.

It is important to note that if the resources were created before the operator and not managed by it, they may not match the desired state that the operator is expecting and may require additional logic or steps to reconcile them.

Also, if the resources were created with some properties which are not recognized by the operator, it may not be able to reconcile them and may require manual intervention or updates to the operator’s code to handle those cases.

me>> but what happens if webhook is used instead of controller pattern?

If a webhook is used instead of a controller pattern in an operator, the behavior when the operator starts and processes existing resources will be different.

When the operator starts, it will register a webhook with the Kubernetes API server. The webhook will be triggered for any new resources that are created or updated, but it will not be triggered for existing resources that were already present in the cluster when the operator starts.

This means that the operator will not be able to process existing resources that were already present in the cluster when it starts. The operator will only be able to process resources that are created or updated after it starts.

To reconcile the existing resources and bring them to the desired state, the operator would have to have additional logic that periodically scans the cluster for resources that it is supposed to manage, and if it finds any, it would need to invoke the webhook manually.

It is worth noting that using a webhook can be useful in situations where you want to perform a specific action on a resource before it is created or updated, such as validating its configuration or mutating its specification. But it is less efficient and might require more complexity to reconcile the existing resources, when compared to using a controller pattern.

from https://chat.openai.com/chat

This explains why the results I got changed from one operator to another. Some of them can handle ‘existing’ custom resources created after CRDs are deployed but before the controller is up and running while others require to check the running state of the controller before creating any custom resource.

istiod vs istio pilot

Istio’s control plane is, itself, a modern, cloud-native application. Thus, it was built from the start as a set of microservices. Individual Istio components like service discovery (Pilot), configuration (Galley), certificate generation (Citadel) and extensibility (Mixer) were all written and deployed as separate microservices. The need for these components to communicate securely and be observable, provided opportunities for Istio to eat its own dogfood (or “drink its own champagne”, to use a more French version of the metaphor!).

[…] in Istio 1.5, we’ve changed how Istio is packaged, consolidating the control plane functionality into a single binary called istiod.

https://istio.io/latest/blog/2020/istiod/

kubectl debug

Sometimes, it might be a good idea to copy a Pod before starting the debugging. Luckily, the kubectl debug command has a flag for that --copy-to <new-name>. The new Pod won’t be owned by the original workload, nor will it inherit the labels of the original Pod, so it won’t be targeted by a potential Service object in front of the workload. This should give you a quiet copy to investigate!

https://iximiuz.com/en/posts/kubernetes-ephemeral-containers/
kubectl debug -it -c debugger --image=busybox \
  --copy-to test-pod \
  --share-processes \
  ${POD_NAME}

Rebuilding my desktop setup

In my day to day activity I need to run locally some containers for development and operation purposes. Till now, I used docker running on my desktop, a 2020 MacBook Pro, 16GB RAM.

Lately, I had to set up tight docker desktop resource limits to 4GB RAM, 2CPU if I wanted that responsiveness of other apps in my desktop would not be affected by locally running containers. Even doing that, laptop fans do lot of noise and things became a bit bumpy, and some container jobs took extra time to complete on only 2 CPUs…

Additionally, you know MacBook Pro x86 is a dead product but for some of us, moving to ARM has some implications both in the way we build and test containers for production x86 cloud infra, so I would prefer to stick to x86 platform dockers. What should I do when I need to renew my laptop?

Moving to a different OS for the desktop is not an option for me at this moment. Going back to Windows to end up running everything on WSL or moving to Linux Desktop and use Office 365 on browser seem too annoying for me.

So my requirements are:

  • Stay on MacOS if possible.
  • Run docker containers on x86, multiplatform containers not a valid option for me.
  • Avoid resource starvation on desktop apps.

So I decided to split up my environment to stay on my current MacBook Pro but use an extra Intel NUC device as my development docker runtime (without any VM layer). That would let me use all my laptop resources for local apps as VSCode, Browser, Office365, etc. to run smoothly.

For my tests, I used NUC7CJYSAL device, with 8GB RAM and 512 SATA SSD. It has 2 a Core Celeron CPU, with no HT support (only 2 system threads). I installed Ubuntu LTS on it and configured certificates to access by ssh without password.

Using VSCode remote option, I can run the IDE on my laptop but with ‘filesystem’ of the NUC. VSCode terminal is NUC terminal so any docker command is run agains NUC docker runtime. Of course, you need your repos downloaded locally on the NUC, but using ssh agent option you can ‘export’ your ssh certs over ssh so your github certificate is magically available for you on the NUC.

But, would this set up mean I need to carry both laptop and NUC everywhere I go?. That is the magic part. I’m currently a Tailscale personal VPN service user, so I added my NUC as device on may Tailscale account. Now, from my laptop I can access my NUC VPN IP anywhere with internet access and I can keep the NUC running at home. An extra benefit is that you can keep things running on NUC and be able to disconnect your laptop (eg. running build or deployment jobs while your daily conmute).

If you add an /etc/host alias in your laptop, combined with cert based ssh authentication, you can easily connect form VSCode or iTerm. You can even share folders on the NUC and connect to them from desktop in case it is needed.

With this configuration, I can evolve both desktop and NUC independently, so in the future I can switch to ARM laptop or a different OS and also focus the laptop resources to run UI based apps (VSCode, Browser, office) without resource constrictions from the development runtime NUC.

I’m currently using this set up and decided to stay with it but using a more powerful i7/16G NUC device. Hope this post brought you any ideas!!

Container port…

[…] containerPort does not set the port that will be exposed, it is purely informational and had no effect on Kubernetes networking. It is however good practice to document the port that the container is listening on for the next poor soul that is trying to figure out your system.

https://stackoverflow.com/questions/55741170/container-port-pods-vs-container-port-service

The EXPOSE instruction informs Docker that the container listens on the specified network ports at runtime. It does not make the ports of the container accessible to the host. To do that, you must use the -p flag to publish a range of ports.

https://stackoverflow.com/questions/35548843/does-ports-on-docker-compose-yml-have-the-same-effect-as-expose-on-dockerfile

DNS CAA

CAA is a security standard that was approved in 2017 and which allows domain owners to prevent Certificate Authorities (CAs; organizations that issue TLS certificates) to issue certificates for their domains.

Domain owners can add a “CAA field” to their domain’s DNS records, and only the CA listed in the CAA field can issue a TLS certificate for that domain.

All Certificate Authorities — like Let’s Encrypt — must follow the CAA specification by the letter of the law or face steep penalties from browser makers.

https://www.zdnet.com/article/lets-encrypt-to-revoke-3-million-certificates-on-march-4-due-to-bug/