Kernel modules and initramfs

The only purpose of an initramfs is to mount the root filesystem. The initramfs is a complete set of directories that you would find on a normal root filesystem. It is bundled into a single cpio archive and compressed with one of several compression algorithms.

At boot time, the boot loader loads the kernel and the initramfs image into memory and starts the kernel. The kernel checks for the presence of the initramfs and, if found, mounts it as / and runs /init. The init program is typically a shell script. Note that the boot process takes longer, possibly significantly longer, if an initramfs is used.

https://www.linuxfromscratch.org/blfs/view/svn/postlfs/initramfs.html

In order to prevent kernel modules loading during boot, the module name must be added to a configuration file for the “modprobe” utility. This file must reside in

/etc/modprobe.d echo "blacklist module_name" >> /etc/modprobe.d/local-dontload.conf

unload the module from the running system if it is loaded.

modprobe -r module_name

If the kernel module is part of the initramfs (use “lsinitrd /boot/initramfs-$(uname -r).img|grep module-name.ko” to verify), then you should rebuild your initial ramdisk image, omitting the module to be avoided

If the kernel module is part of the initramfs (boot configuration), rebuild your initial ramdisk image, omitting the module to be avoided

# dracut --omit-drivers module_name -f

https://access.redhat.com/solutions/41278

Initramfs stands for Initial Random-Access Memory File System. On modern Linux systems, it is typically stored in a file under the /boot directory. The kernel version for which it was built will be included in the file name. A new initramfs is generated every time a new kernel is installed.

You can use the lsinitrd command to list the contents of your initramfs archive.

The dracut command can be used to modify the contents of your initramfs […] you can re-run the dracut command to regenerate the initramfs with only the drivers that are needed.

# dracut –force

https://fedoramagazine.org/initramfs-dracut-and-the-dracut-emergency-shell/

There might be multiple [modules] lists: one for kernel modules loaded within initramfs (i.e. modules necessary for basic I/O and accessing the root filesystem) and another list loaded once the root filesystem has been mounted.

For Debian and related Linux distributions like Ubuntu, there’s /etc/initramfs-tools/modules for modules to be loaded in initramfs

For any distribution using the dracut initramfs creator, you might want to look into /etc/dracut.conf and/or /etc/dracut.conf.d/*.conf files for add_drivers, force_drivers and/or filesystems lines: these will cause the specified modules to be added into initramfs, and in case of force_drivers, explicitly loaded regardless of hardware detection.

https://unix.stackexchange.com/questions/527168/kernel-modules-loaded-when-boot

Make a backup of your existing initial ramdisk.

$ sudo cp /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).bak.$(date +%m-%d-%H%M%S).img

https://www.lightnetics.com/topic/25016/how-do-i-prevent-a-redhat-kernel-module-loading-at-boot-time-of-after

Blacklisting doesn’t prevent the modules from being added to the initramfs, it only prevents the modules from being loaded.

https://bbs.archlinux.org/viewtopic.php?id=157241

To blacklist a kernel module permanently via GRUB, open the /etc/default/grub file for editing, and add the modprobe.blacklist=MODULE_NAME option to the GRUB_CMD_LINUX command. Then run the sudo grub2-mkconfig -o /boot/grub2/grub. cfg command to enable the changes.

https://documentation.suse.com/sles/12-SP4/html/SLES-all/cha-mod.html

[To] prevent a module being loaded if it is a required or optional dependency of another module […] this can be achieved by configuring the following setting in /etc/modprobe.d/local-blacklist.conf:

# vi /etc/modprobe.d/local-blacklist.conf install [module name] /bin/false

The above install line simply causes /bin/false to be run instead of installing a module. Same can be achieved by using the /bin/true.

https://www.thegeekdiary.com/centos-rhel-how-to-disable-and-blacklist-linux-kernel-module-to-prevent-it-from-loading-automatically/

Change BMC settings from host

When the BMC console is lost but the host is not…

# First install ipmitool
> sudo yum install OpenIPMI ipmitool

# Lets have a look on user on channel1
> sudo ipmitool user list 1

ID  Name	     Callin  Link Auth	IPMI Msg   Channel Priv Limit
1                    false   false      true       ADMINISTRATOR
2   admin            false   false      true       ADMINISTRATOR
3   ADMIN            false   false      true       ADMINISTRATOR
4                    true    false      false      NO ACCESS
5                    true    false      false      NO ACCESS
6                    true    false      false      NO ACCESS
7                    true    false      false      NO ACCESS
8                    true    false      false      NO ACCESS
9                    true    false      false      NO ACCESS
10                   true    false      false      NO ACCESS
11                   true    false      false      NO ACCESS
12                   true    false      false      NO ACCESS
13                   true    false      false      NO ACCESS
14                   true    false      false      NO ACCESS
15                   true    false      false      NO ACCESS
16                   true    false      false      NO ACCESS

# Cold or warm reset
> sudo ipmitool mc reset cold

# Check IP info, channel1
> sudo ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : NONE MD2 MD5 PASSWORD OEM
Auth Type Enable        : Callback : MD5
                        : User     : MD5
                        : Operator : MD5
                        : Admin    : MD5
                        : OEM      : MD5
IP Address Source       : Static Address
IP Address              : 10.11.32.83
Subnet Mask             : 255.255.255.0
MAC Address             : e0:d5:5e:ca:ad:48
SNMP Community String   : AMI
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl   : 0.0 seconds
Default Gateway IP      : 10.11.32.1
Default Gateway MAC     : 00:0c:29:b3:54:98
Backup Gateway IP       : 0.0.0.0
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
RMCP+ Cipher Suites     : 0,1,2,3,6,7,8,11,12,15,16,17
Cipher Suite Priv Max   : caaaaaaaaaaaXXX
                        :     X=Cipher Suite Unused
                        :     c=CALLBACK
                        :     u=USER
                        :     o=OPERATOR
                        :     a=ADMIN
                        :     O=OEM
Bad Password Threshold  : 0
Invalid password disable: no
Attempt Count Reset Int.: 0
User Lockout Interval   : 0

# Edit user 3 properties, channel1
> sudo ipmitool -I open channel setaccess 1 3 callin=on ipmi=on link=on privilege=0x4

Basic ICO procedure

Let’s say that Alice wants to allow the AliceICO contract to sell 50% of all the AliceCoin tokens to buyers like Bob and Charlie.

First, Alice launches the AliceCoin ERC20 contract, issuing all the AliceCoin to her own address.

Then, Alice launches the AliceICO contract that can sell tokens for ether. Next, Alice initiates the approve & transferFrom workflow. She sends a transaction to the AliceCoin contract, calling approve with the address of the AliceICO contract and 50% of the totalSupply as arguments. This will trigger the Approval event. Now, the AliceICO contract can sell AliceCoin.

When the AliceICO contract receives ether from Bob, it needs to send some AliceCoin to Bob in return. Within the AliceICO contract is an exchange rate between AliceCoin and ether. The exchange rate that Alice set when she created the AliceICO contract determines how many tokens Bob will receive for the amount of ether sent to the AliceICO contract.

Mastering Ethereum: Building Smart Contracts and DApps por Andreas M. Antonopoulos, Gavin Wood Ph. D.
https://amzn.eu/5JtOwKQ

Does SR-IOV CNI requires using DPDK?

No. Kernel o DPDK driver can be used, configured at network attachment definition.

A Network Attachment Definition for SR-IOV CNI takes the form:

apiVersion: “k8s.cni.cncf.io/v1”
kind: NetworkAttachmentDefinition
metadata:
name: sriov-net1
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_netdevice
spec:
config: ‘{
[…]
}’

This is the [extended] configuration for a working kernel driver interface using an SR-IOV Virtual Function. It applies an IP address using the host-local IPAM plugin in the range of the subnet provided:

{
“cniVersion”: “0.3.1”,
“name”: “sriov-advanced”,
“type”: “sriov”,
“vlan”: 1000,
“spoofchk”: “off”,
“trust”: “on”,
“ipam”: {
“type”: “host-local”,
“subnet”: “10.56.217.0/24”,
“routes”: [{
“dst”: “0.0.0.0/0”
}],
“gateway”: “10.56.217.1”
}
}

The below config will configure a VF using a userspace driver (uio/vfio) for use in a container. If this plugin is used with a VF bound to a dpdk driver then the IPAM configuration will still be respected, but it will only allocate IP address(es) using the specified IPAM plugin, not apply the IP address(es) to container interface.

{
“cniVersion”: “0.3.1”,
“name”: “sriov-dpdk”,
“type”: “sriov”,
“vlan”: 1000
}

Note DHCP IPAM plugin can not be used for VF bound to a dpdk driver (uio/vfio).

Note When VLAN is not specified in the Network-Attachment-Definition, or when it is given a value of 0, VFs connected to this network will have no vlan tag.

https://github.com/openshift/sriov-cni

Ingress Controller, MetalLB

A Service definition [eg] collects all pods that have a selector label app=foo and routes traffic evenly among them. However, this service is accessible from inside the cluster only.

[…] Two mechanisms were integrated directly into the Service specification to deal with it. […] You can include a field named type, which takes a value of either NodePort or LoadBalancer.

NodePort type assign a random TCP port and expose it outside the cluster. a client can target any node in the cluster using that port and their messages will be relayed to the right place. The downside is that the port’s value must fall between 30000 and 32767,

LoadBalancer [type] only works if you are operating in a cloud-hosted environment like Google’s GKE or Amazon’s EKS and that a hosted load balancer is spun up for every service with this type, along with a new public IP address, which has additional costs.

Kubernetes API introduced a new type of manifest, called an Ingress. The manifest doesn’t actually do anything on its own; you must deploy an Ingress Controller into your cluster to watch for these declarations.

Ingress controllers are pods, just like any other application, so they’re part of the cluster and can see other pods. They’re built using reverse proxies. Ingress Controllers are susceptible to the same walled-in jail as other Kubernetes pods. You need to expose them to the outside via a Service with a type of either NodePort or LoadBalancer […] one service connected to one Ingress Controller, which, in turn, is connected to many internal pods.

You can install the HAProxy Ingress Controller using Helm, The HAProxy Ingress Controller runs inside a pod in your cluster and uses a Service resource of type NodePort to publish access to external clients.

https://thenewstack.io/kubernetes-ingress-for-beginners/

If you’re not running on a supported IaaS platform (GCP, AWS, Azure…), LoadBalancers will remain in the “pending” state indefinitely when created.

Bare-metal cluster operators are left with two lesser tools to bring user traffic into their clusters, “NodePort” and “externalIPs” services.

https://metallb.universe.tf

MetalLB hooks into your Kubernetes cluster, and provides a network load-balancer implementation, in clusters that don’t run on a cloud provider.

In layer 2 mode, one machine in the cluster takes ownership of the service, and uses standard address discovery protocols (ARP for IPv4, NDPfor IPv6) to make those IPs reachable on the local network.

In BGP mode, all machines in the cluster establish BGP peering sessions with nearby routers that you control, and tell those routers how to forward traffic to the service IPs.

https://metallb.universe.tf/concepts/

ingress-nginx is an Ingress controller for Kubernetes using NGINX as a reverse proxy and load balancer.

https://github.com/kubernetes/ingress-nginx

Ingress does not support TCP or UDP services. For this reason [nginx] Ingress controller uses the flags --tcp-services-configmap and --udp-services-configmap to point to an existing config map where the key is the external port to use and the value indicates the service to expose.

https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/

To expose UDP service via NGINX, you need four things:

1. Add port definition to DaemonSet (by default it only exposes TCP/80 and TCP/443)

2. Run your app

3. Create a service exposing your app

4. Add service definition to ConfigMap udp-services in the ingress-nginx namespace.

https://gist.github.com/superseb/ba6becd1a5e9c74ca17996aa59bcc67e

Regardless of your ingress strategy, you probably will need to start with an external load balancer. This load balancer will then route traffic to a Kubernetes service (or ingress) on your cluster that will perform service-specific routing. In this set up, your load balancer provides a stable endpoint (IP address) for external traffic to access.

https://www.getambassador.io/learn/kubernetes-ingress/kubernetes-ingress-nodeport-load-balancers-and-ingress-controllers/

[…] use Nginx as an Ingress Controller for our cluster combined with MetalLB which will act as a network load-balancer for all incoming communications.

https://blog.dbi-services.com/setup-an-nginx-ingress-controller-on-kubernetes/

To install MetalLB on baremetal, you can do it by installing yaml files or using Helm. In this case we used yaml. A ConfigMag instance has to be created with the config info for MetalLB, mostly layer2 protocol (ARP) and the list of IPs to be used by LoadBalancer instances.

> kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.10.3/manifests/namespace.yaml
> kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.10.3/manifests/metallb.yaml
> cat metallb-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 10.95.208.83-10.95.208.84
> kubectl apply -f metallb-config.yaml

In order to test, you have to create a LoadBalancer with a selector that actually applies to an existing pod. For example:

> cat load-balancer-example.yaml
apiVersion: v1
kind: Service
metadata:
  name: load-balancer-service
spec:
  selector:
    app: example
  type: LoadBalancer
  ports:
  - name: http
    port: 80
    targetPort: 80
    protocol: TCP
> cat pod-example.yaml
apiVersion: v1
kind: Pod
metadata:
  name: static-web
  labels:
    app: example
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP
> kubectl apply -f load-balancer-example.yaml
service/load-balancer-service created
> kubectl apply -f pod-example.yaml
pod/static-web created
> kubectl get services
NAME                             TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                     AGE
load-balancer-service            LoadBalancer   172.19.66.192   10.95.208.83   80:32676/TCP                19s

From previous log, you can check a LoadBalancer instance is created and is assigned 10.95.208.83 IP address. If we do arping to the 10.95.208.83, we can check in MetalLB speaker gets ARP request and respond, from speaker logs:

arping -I br0 10.95.208.83
ARPING 10.95.208.83 from 10.95.208.80 br0
Unicast reply from 10.95.208.83 [A4:BF:01:74:EA:12]  1.553ms
Unicast reply from 10.95.208.83 [A4:BF:01:74:EA:12]  1.381ms

[...]

> kubectl logs -l component=speaker -n metallb-system --since=1m

{"caller":"arp.go:102","interface":"br0","ip":"10.95.208.83","msg":"got ARP request for service IP, sending response","responseMAC":"a4:bf:01:74:ea:12","senderIP":"10.95.208.80","senderMAC":"a4:bf:01:74:e9:9b","ts":"2021-10-15T14:31:14.314805023Z"}

And we can check we can hit test pod on the LoadBalancer IP and port:

wget 10.95.208.83
--2021-10-15 16:31:50-- http://10.95.208.83/
Connecting to 10.95.208.83:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 615 [text/html]
Saving to: ‘index.html’

index.html 100%[============================>] 615 --.-KB/s in 0s

2021-10-15 16:31:50 (196 MB/s) - ‘index.html’ saved [615/615]

Once MetalLB is installed, we can proceed to install NGINX Ingress Controller. This is so because Ingress Controller is exposed by means of LoadBalancer or NodePort:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx

After installation, we can check services deployed and we will find a LoadBalancer instance:

kubectl get services -A

t001-u000003             ingress-nginx-controller                     LoadBalancer   172.19.164.118   10.95.208.83   80:30813/TCP,443:30232/TCP                                                                            19s
t001-u000003             ingress-nginx-controller-admission           ClusterIP      172.19.88.243    <none>         443/TCP

We can finally deploy an Ingress:

> cat ingress-test.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$1
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
    - http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: service-example
                port:
                  number: 80
> kubectl apply -f ingress-test.yaml

ingress.networking.k8s.io/example-ingress created

> kubectl get ingress -A
NAMESPACE      NAME              CLASS    HOSTS   ADDRESS   PORTS   AGE
default        example-ingress   <none>   *                 80      8s

And check again we can hit the service on LoadBallancer IP and port 80:

[labuser@tip-dev-1 ~]$ wget 10.95.208.83:80
--2021-10-18 16:25:25--  http://10.95.208.83/
Connecting to 10.95.208.83:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 562 [text/html]
Saving to: ‘index.html.2’

100%[====================================================================================================================================================================================================================================================================================>] 562         --.-K/s   in 0s

2021-10-18 16:25:25 (55.1 MB/s) - ‘index.html’ saved [562/562]

Helm basics…

For a typical cloud-native application with a 3-tier architecture, each tier consists of a Deployment and Service object, and may additionally define ConfigMap or Secret objects. Each of these objects are typically defined in separate YAML files, and are fed into the kubectl command line tool.

A Helm chart encapsulates each of these YAML definitions, provides a mechanism for configuration at deploy-time and allows you to define metadata and documentation

https://docs.bitnami.com/tutorials/create-your-first-helm-chart/

Helm charts are structured like this:

The templates/ directory is for template files. When Helm evaluates a chart, it will send all of the files in the templates/ directory through the template rendering engine. It then collects the results of those templates and sends them on to Kubernetes.

The values.yaml file is also important to templates. This file contains the default values for a chart. These values may be overridden by users during helm install or helm upgrade.

The Chart.yaml file contains a description of the chart. You can access it from within a template.

charts/ directory may contain other charts (which we call subcharts). Later in this guide we will see how those work when it comes to template rendering.

Use this command to create a new chart named mychart in a new directory:

helm create mychart

https://helm.sh/docs/chart_template_guide/getting_started/

Helm runs each file in [template] directory through a Go template rendering engine.

We can do a dry-run of a helm install and enable debug to inspect the generated definitions:

helm install --dry-run --debug ./mychart

We can use helm package to create the tar package:

helm package ./mychart

Helm will create a mychart-0.1.0.tgz package in our working directory, using the name and version from the metadata defined in the Chart.yaml file. A user can install from this package instead of a local directory by passing the package as the parameter to helm install.

helm install example3 mychart-0.1.0.tgz

Helm allows you to specify sub-charts that will be created as part of the same release. To define a dependency, create a requirements.yaml file in the chart root directory.

https://docs.bitnami.com/tutorials/create-your-first-helm-chart/

Template files should have the extension .yaml if they produce YAML output. The extension .tpl may be used for template files that produce no formatted content.

https://helm.sh/docs/chart_best_practices/templates/

Kubernetes CSI, CDI, openEBS

The Container Storage Interface (CSI) is a standard for exposing storage to workloads on Kubernetes. To enable automatic creation/deletion of volumes for CSI Storage, a Kubernetes resource called StorageClass must be created and registered within the Kubernetes cluster.

Associated with the StorageClass is a CSI provisioner plugin that does the heavy lifting at disk and storage management layers to provision storage volumes based on the various attributes defined in the StorageClass. Kubernetes CSI was introduced in Kubernetes v1.9 release, promoted to beta in Kuberentes v1.10 release as CSI v0.3, followed by a GA release in Kubernetes v1.13 as CSI v1.0.

https://docs.robin.io/storage/latest/storage.html

Containerized-Data-Importer (CDI) is a persistent storage management add-on for Kubernetes. It’s primary goal is to provide a declarative way to build Virtual Machine Disks on PVCs for Kubevirt VMs.

CDI works with standard core Kubernetes resources and is storage device agnostic, while its primary focus is to build disk images for Kubevirt, it’s also useful outside of a Kubevirt context to use for initializing your Kubernetes Volumes with data.

The kubevirt content type indicates that the data being imported should be treated as a Kubevirt VM disk. CDI will automatically decompress and convert the file from qcow2 to raw format if needed. It will also resize the disk to use all available space.

CDI is designed to be storage agnostic.

https://github.com/kubevirt/containerized-data-importer

OpenEBS is Kubernetes native Container Attached Storage solution that makes it possible for Stateful applications to easily access Dynamic Local PVs or Replicated PVs

OpenEBS can be used across all Kubernetes distributions – On-premise and Cloud.

OpenEBS turns any storage available on the Kubernetes worker nodes into local or distributed Kubernetes Persistent Volumes.

OpenEBS is the leading choice for NVMe based storage deployments. OpenEBS is completely Open Source and Free.

The Stateful Pod writes the data to the OpenEBS engines that synchronously replicates the data to multiple nodes in the cluster. OpenEBS engine itself is deployed as pod and orchestrated by Kubernetes.

https://openebs.io/docs/