Kubeedge

KubeEdge is based on Huawei’s Intelligent Edge Fabric (IEF) — a commercial IoT edge platform based on Huawei IoT PaaS. A large part of IEF has been modified and open sourced for KubeEdge.
https://thenewstack.io/how-kubernetes-is-becoming-the-universal-control-plane-for-distributed-applications/

eBPF, XFP, Cilium…

user-space networking achieves high-speed performance by moving packet-processing out of the kernel’s realm into user-space. XDP does in fact the opposite: it moves user-space networking programs (filters, mappers, routing, etc) into the kernel’s realm. XDP allow us to execute our network function as soon as a packet hits the NIC, and before it starts moving upwards into the kernel’s networking subsystem
http://blogs.igalia.com/dpino/2019/01/07/a-brief-introduction-to-xdp-and-ebpf/

The typical workflow is that BPF programs are written in C, compiled by LLVM into object / ELF files, which are parsed by user space BPF ELF loaders (such as iproute2 or others), and pushed into the kernel through the BPF system call. The kernel verifies the BPF instructions and JITs them, returning a new file descriptor for the program, which then can be attached to a subsystem (e.g. networking). If supported, the subsystem could then further offload the BPF program to hardware (e.g. NIC).
clang -O2 -Wall -target bpf -c xdp-example.c -o xdp-example.o
ip link set dev em1 xdp obj xdp-example.o
https://docs.cilium.io/en/stable/bpf/

JIT compilers speed up execution of the BPF program significantly since they reduce the per instruction cost compared to the interpreter. Often instructions can be mapped 1:1 with native instructions of the underlying architecture.
Maps are efficient key / value stores that reside in kernel space. They can be accessed from a BPF program in order to keep state among multiple BPF program invocations. They can also be accessed through file descriptors from user space and can be arbitrarily shared with other BPF programs or user space.
https://docs.cilium.io/en/stable/bpf/

XDP allows you to attach an eBPF program to a lower-level hook inside the kernel. Such a hook is implemented by the network device driver, inside the ingress traffic processing function.
Not all network device drivers implement the XDP hook. In such a case, you may fall back to the generic XDP hook, implemented by the core kernel.
Since such a hook takes place later in the networking stack, the performance observed there is much lower.
https://developers.redhat.com/blog/2018/12/06/achieving-high-performance-low-latency-networking-with-xdp-part-1

XDP runs in the kernel network driver, it can read the ethernet frames from the RX ring of the NIC and take actions immediately. XDP plugs into the eBPF infrastructure through an RX hook implemented in the driver.
https://developer.nvidia.com/blog/accelerating-with-xdp-over-mellanox-connectx-nics/

Networking programs in BPF, in particular for tc and XDP do have an offload-interface to hardware in the kernel in order to execute BPF code directly on the NIC.
Currently, the nfp driver from Netronome has support for offloading BPF through a JIT compiler which translates BPF instructions to an instruction set implemented against the NIC. This includes offloading of BPF maps to the NIC as well, thus the offloaded BPF program can perform map lookups, updates and deletions.
https://docs.cilium.io/en/stable/bpf/

We present a solution to run Linux’s eXpress Data Path programs written in eBPF on FPGAs, using only a fraction of the available hardware resources while matching the performance of high-end CPUs.
https://fosdem.org/2021/schedule/event/sdn_hxdp_fpga/

LLVM (formerly Low Level Virtual Machine) is a compiler infrastructure designed to optimize the compilation, binding and execution times of programs written in various programming languages.
The LLVM project began in 2000 at the University of Illinois at Urbana – Champaign under the direction of Vikram Adve and Chris Lattner. In 2005 Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses in development systems. from Apple.
https://en.terminalroot.com.br/gcc-vs-llvm-which-is-the-best-compiler/

When no configuration is provided, Cilium automatically runs in [encapsulation] mode as it is the mode with the fewest requirements on the underlying networking infrastructure.
In this mode, all cluster nodes form a mesh of tunnels using the UDP-based encapsulation protocols VXLAN or Geneve. All traffic between Cilium nodes is encapsulated.
In native routing mode, Cilium will delegate all packets which are not addressed to another local endpoint to the routing subsystem of the Linux kernel. This means that the packet will be routed as if a local process would have emitted the packet.
https://docs.cilium.io/en/v1.8/concepts/networking/routing/

Replacing iptables with eBPF in
Kubernetes with Cilium
https://archive.fosdem.org/2020/schedule/event/replacing_iptables_with_ebpf/attachments/slides/3622/export/events/attachments/replacing_iptables_with_ebpf/slides/3622/Cilium_FOSDEM_2020.pdf

Main actions [outgoing from pod]:
1. Service load balancing: select a proper Pod from backend list, we assume POD4 on NODE2 is selected.
2. Create or update connection tracking (CT or conntrack) record.
3. Perform DNAT, replace ServiceIP with POD4_IP for the dst_ip field in IP header.
4. Perform egress network policy checking.
5. Perform encapsulation if in tunnel mode, or pass the packet to kernel stack if in direct routing mode.
https://arthurchiao.github.io/blog/cilium-life-of-a-packet-pod-to-service/

WTF is SASE (Secure Access Service Edge)

SASE combines SD-WAN with computer security functions, including cloud access security brokers(CASB), Secure Web Gateways (SWG), antivirus/malware inspection, virtual private networking (VPN), firewall as a service (FWaaS), and data loss prevention (DLP), all delivered by a single cloud service at the network edge.
[…] typically delivered as a single service at globally dispersed SASE points of presence (PoPs) located as close as possible to dispersed users, branch offices and cloud services.^[2] To access SASE services, edge locations or users connect to the closest available PoP. SASE vendors may contract with several backbone providers and peering partners to offer customers fast, low-latency WAN performance for long-distance PoP-to-PoP connections.
Edge connections to the local PoP may vary from an SD-WAN for a branch office to a VPN client or clientless Web access for a mobile user, to multiple tunnels from the cloud or direct cloud connections inside a global data center.
https://en.wikipedia.org/wiki/Secure_access_service_edge

CBRS, Citizens Broadband Radio Service

In 2015, the Commission adopted rules for shared commercial use of the 3550-3700 MHz band (3.5 GHz band). The Commission established the Citizens Broadband Radio Service (CBRS) and created a three-tiered access and authorization framework.
https://www.fcc.gov/35-ghz-band-overview

1. Incumbents: Existing users (e.g. US Naval Radar, DoD personnel) get permanent priority as well as site-specific protection for registered sites.
2. Priority Access Licenses (PAL): Organizations can pay a fee to request up to four PALs in a limited geographic area for three years. Only the lower 100 MHz of the CBRS band will be auctioned off; with restrictions of a maximum of seven concurrent 10 MHz PALs
3. General Authorized Access (GAA): The rest of the spectrum will be open to GAA use and coexistence issues will be determined by SAS providers for spectrum allocation.
https://www.leverege.com/research-papers/cellular-lpwa

A network of sensors—the Environmental Sensing Capability (ESC)—detects use of CBRS. Devices that want to use the CBRS band first put in requests to a cloud-based Spectrum Access System (SAS) to reserve unused channels in a particular geographic area. If channels are free, SAS can grant the requests. When devices that have been granted permission to use channels are done with them, the channels are put back into the pool that the SAS can draw from to grant further requests.
https://www.networkworld.com/article/3180615/faq-what-in-the-wireless-world-is-cbrs.html

When someone wants to use the CBRS spectrum, they must contact a SAS administrator and use the cloud-based SAS database to indicate where they want to deploy their CBRS access points.
They input precise data about latitude, longitude, and height into the SAS database. And then the SAS administrator determines if the spectrum is available. The SAS administrator can then assign spectrum channels and grant authority for CBRS Devices (CBSDs) to operate in the channel. The SAS administrator also authorizes the appropriate transmit power levels.
https://www.fiercewireless.com/private-wireless/what-a-cbrs-spectrum-access-system

A CBRS device (CBSD) needs authorization from the SAS before it starts to transmit in the CBRS band. CBSDs communicate with the SAS using the SAS-CBSD API.
https://support.google.com/sas/answer/9539282?hl=en

Five companies operate Spectrum Access Systems (SAS): Federated Wireless, Google, CommScope, Amdocs, and Sony. A sixth, Key Bridge Wireless, has a system in field trials.
Every SAS provider must develop or license an Environmental Sensing Capability (ESC).
https://www.fiercewireless.com/private-wireless/cbrs-users-learn-ins-and-outs-spectrum-access-systems

Because there is more than one SAS operating, the SAS Administrators must coordinate spectrum not only among all of the transmitting devices within their purview, but also devices being managed by the others. This occurs during each overnight period, known as a Coordinated Periodic Activity Among SASs (CPAS), so the entire system is aware of all devices operating in the band.
A User Equipment (UE) device that operates under 23 dBm EIRP (typically a handset) is not required to be registered to a SAS, but can communicate via a CBRS network. Anything that’s higher power needs to register and maintain connection to a SAS, and is known as a CBSD (CBRS Device).
https://www.cambiumnetworks.com/wp-content/uploads/2020/02/CBRS_FAQ_v1_.pdf

CPAS currently runs from 7 AM UTC–10 AM UTC each day. During CPAS, the SAS can’t issue new grants. At the end of CPAS, the SAS notifies the CBSD of any required changes through the heartbeat response.
https://support.google.com/sas/answer/9554929?hl=en

EKS Anywhere

EKS-A is Amazon’s own version of Anthos. Just like Anthos, it’s tightly integrated with vSphere, can be installed on bare metal or any other cloud. But the key difference is that there is no meta control plane to manage all the EKS-A clusters from a single pane of glass.
There is nothing open source about EKS-A. It’s an opaque installer that rolls out an EKS-like cluster on a set of compute nodes. If you want to customize the cluster components, switch to EKS-D, and assemble your own stack.
Combined with a new addition called EKS Console, multiple EKS-A clusters can be managed from the familiar AWS Console.
https://www.forbes.com/sites/janakirammsv/2020/12/06/aws-responds-to-anthos-and-azure-arc-with-amazon-eks-anywhere/

EKS Console, the web-based interface originally launched to manage EKS clusters in the cloud, now supports registering EKS Anywhere. Branded as EKS Connector […]
https://www.forbes.com/sites/janakirammsv/2021/09/09/amazon-announces-the-general-availability-of-eks-anywhere/

Amazon EKS displays connected clusters in Amazon EKS console for workload visualization only and does not manage them. […] is in preview release for Amazon EKS and is subject to change.
https://docs.aws.amazon.com/eks/latest/userguide/eks-connector.html

EKS Distro is the same Kubernetes distribution used in AWS and that includes out-of-the-box optional defaults for node OS, container runtime, service load balancer, container network interface (CNI), and ingress and storage classes
https://cloud.netapp.com/blog/cvo-blg-eks-anywhere-and-ecs-anywhere-multicloud-services

EKS on AWS Outposts was one of the options where users were able to run Kubernetes in their own data centers but on AWS infrastructure. With EKS Anywhere users can deploy Kubernetes on their own infrastructure and is managed by the customer with a consistent AWS management experience in their data center.
The cluster registration process involves two steps: registering the cluster with Amazon EKS and applying a connector YAML manifest file in the target cluster to enable connectivity.
The eks-connector is deployed as a StatefulSet and consists of two containers – amazon-ssm-agent (connector-agent) and eks-connector (connector-proxy). Amazon EKS leverages AWS Systems Manager’s agent to connect to AWS services.
The ‘Workloads‘ section displays all objects of Type: Deployment, DaemonSet and StatefulSet. […] other objects of the cluster (like services, ingress, secrets, etc.) are not available for visualization.
https://www.linkedin.com/pulse/amazon-eks-anywhere-connector-gokul-chandra/

To show you how easy it is to get started, let’s install the EKS Anywhere CLI, create a local development cluster, and deploy [docker provider] an example workload with only four commands.
brew install aws/tap/eks-anywhere
eksctl anywhere generate clusterconfig local-cluster \ –provider docker > local-cluster.yaml
eksctl anywhere create cluster -f local-cluster.yaml
https://aws.amazon.com/blogs/containers/introducing-general-availability-of-amazon-eks-anywhere/

Three tier Intel Smart Edge

Intel acquired Smart Edge in 2019 to continue to drive its vision of networks built on open, industry-standard edge computing. Built on OpenNESS (openness.org), the Intel® Smart Edge offering is a multi-access edge (MEC) platform.
https://www.intel.com/content/www/us/en/design/technologies-and-topics/edge-cloud-computing/smart-edge-software.html

Smart Edge Open Kubernetes Control Plane: This node consists of microservices and Kubernetes extensions, enhancements, and optimizations that provide the functionality to configure one or more Smart Edge Open Edge Nodes and the application services that run on those nodes (Application Pod Placement, Configuration of Core Network, etc).
Smart Edge Open Edge Node: This node consists of microservices and Kubernetes extensions, enhancements, and optimizations that are needed for edge application and network function deployments. It also consists of APIs that are often used for the discovery of application services.
Edge Multi-Cluster Orchestration (EMCO), is a Geo-distributed application orchestrator for Kubernetes*. The main objective of EMCO is automation of the deployment of applications and services across clusters. It acts as a central orchestrator that can manage edge services and network functions across geographically distributed edge clusters from different third parties.
https://github.com/smart-edge-open/specs/blob/master/doc/architecture.md

EMCO operates at a higher level than Kubernetes* and interacts with multiple of edges and clouds running Kubernetes.
https://github.com/smart-edge-open/specs/blob/master/doc/building-blocks/emco/smartedge-open-emco.md

You can build and deploy the EMCO components in your local environment (and use them to deploy your workload in a set of remote Kubernetes clusters).
Alternatively, you can build EMCO locally and deploy EMCO components in a Kubernetes cluster using Helm charts (and use them to deploy your workload in another set of Kubernetes clusters).
https://github.com/smart-edge-open/EMCO

Kubernetes at the edge

The biggest question to ask about running Kubernetes at the edge is whether your IT organization’s edge resources are comparable to its cloud resources. If they are, a standard Kubernetes deployment — with set node affinities and related pod-assignment parameters to steer edge pods to edge nodes — is the more effective setup. If the edge and cloud environments are symbiotic, rather than unified, consider KubeEdge. Most edge users should consider this to be the default option.
The more dissimilar the edge and cloud environments or requirements are, the more logical it is to keep the two separated — particularly if edge resources are too limited to run standard Kubernetes. If you want common orchestration of both edge and cloud workloads so the cloud can back up the edge, for example, use MicroK8s or a similar distribution. If latency or resource specialization at the edge eliminates the need for that cohesion, K3s is a strong choice.
https://searchitoperations.techtarget.com/tip/Run-Kubernetes-at-the-edge-with-these-K8s-distributions

nVidia vGPUs for containers

If you want a container to use a vGPU, you have to run in a dedicated VM !?!

The document describes how to set up a VM configured with NVIDIA virtual GPU sofware to run NGC containers.
https://docs.nvidia.com/ngc/pdf/ngc-vgpu-setup-guide.pdf

Reduce VM disk size

Virt-sparsify is a tool which can make a virtual machine disk (or any disk image) sparse a.k.a. thin-provisioned. This means that free space within the disk image can be converted back to free space on the host.
Virt-sparsify tries to zero and sparsify free space on every filesystem it can find within the source disk image.
However if a virtual machine has multiple disks and uses volume management, then virt-sparsify will work but not be very effective.
https://libguestfs.org/virt-sparsify.1.html

Sparse files are created differently than a normal (non-empty) file. Whenever a sparse file is created metadata representing the empty blocks (bytes) of disks is written to the disk, rather than the actual bytes which make up block, using less disk space.
When reading sparse files, the file system transparently converts metadata representing empty blocks into “real” blocks filled with null bytes at runtime.
https://www.geeksforgeeks.org/sparse-files/

Sparse disk image formats such as qcow2 only consume the physical disk space which they need. For example, if a guest is given a qcow2 image with a size of 100GB but has only written to 10GB then only 10GB of physical disk space will be used.
https://www.jamescoyle.net/how-to/323-reclaim-disk-space-from-a-sparse-image-file-qcow2-vmdk

Only the qcow2 format supports encryption or compression. qcow2 encryption uses the AES format with secure 128-bit keys.
https://docs.fedoraproject.org/en-US/Fedora/18/html/Virtualization_Administration_Guide/sect-Virtualization-Tips_and_tricks-Using_qemu_img.html

The resize [raw image] doesn’t actually allocate any disk blocks; the image file is left as a sparse file. If you have expanded the partition and filesystem in the image, you will be able to write more data to files inside, and as data gets written to new blocks of the image file for the first time, actual disk blocks will be allocated.
https://superuser.com/questions/981113/raw-img-file-virtual-space-extended-but-disk-size-still-same

A virtual disk in thin format uses only as much space on the datastore as needed. This means that, if you create a 10 GB virtual disk and place 3 GB of data in it, only the 3 GB of space on the datastore will be used, but the performance will not be as good as with the other two disk types.
You can convert a thin disk to a thick disk by inflating it to its full size.
https://geek-university.com/vmware-esxi/inflate-thin-disk/

OpenCV and OpenVINO

Deep learning-based models can be easily trained on NVIDIA GPUs since there is a vast availability of popular frameworks which supports them.
One of the most convenient parts of OpenVINO is that it comes with OpenCV which is already compiled and built to support Intel GPU and Intel NCS 2.
https://link.medium.com/K6bLpTezdkb

OpenCV or OpenVINO does not provide you tools to train a neural network.
[…] instead of directly using the trained model for inference, OpenVINO requires us to create an optimized model which they call Intermediate Representation (IR) using a Model Optimizer tool.
OpenVINO optimizes running this model on specific hardware through the Inference Engine plugin. This plugin is available for all intel hardware (GPUs, CPUs, VPUs, FPGAs).
While OpenCV DNN in itself is highly optimized, with the help of Inference Engine we can further increase its performance.
https://learnopencv.com/using-openvino-with-opencv/