Building a Kubernetes ML Cluster on SBCs - Part 1

So for some reason I decided to use Fedora CoreOS for the control plane (we don’t call them Master no more ), so let’s try to set things up.

Getting started with Fedora CoreOS

CoreOS is an immutable operating system, that means there is no logging in for adminstration. Instead, we use a declarative approach. That means, in the best DevOps tradition, we write a yaml file that describes the OS, and then deploy it. Coming from an Ansible background that should seem natural, but somehow it is scary at first.
Let’s not go the full monty straight away, but rather start with a first experiment: A single VM running a basic FCOS.

The Experiment

We need an ignition file, so lets grab that from the documentation :

1variant: fcos
2version: 1.4.0
3passwd:
4  users:
5    - name: core
6      ssh_authorized_keys:
7        - ssh-rsa AAAA...

In my case I need to use an ssh-key generated on the host for now, because the VM will connect to the default libvirt network, and that doesn’t allow access from remote systems. We’ll work on that later.

(… in the meantime let me debug an ethernet dropout problem with NetworkManager autonegotiation on RTL8125B. Everything is defective by default …)

(… and then debug an error with sssd on Fedora 35: ‘DB version too old’ when adding a user to a group. Everything is always broken …)

Using butane (installed from the repos), we can create an ignition file. Ignore the .bu in the docs, a .yaml file works perfectly.

The Result

After giving qemu r/x permissions on my entire home folder, it finally starts. Here is the shell script I use to run it:

 1#!/bin/bash
 2IGNITION_CONFIG="/absolute/path/to/simple.ign"
 3IMAGE="/absolute/path/to/image.qcow"
 4VM_NAME="fcos-test-01"
 5VCPUS="2"
 6RAM_MB="2048"
 7DISK_GB="10"
 8STREAM="stable"
 9
10virt-install --connect="qemu:///system" --name="${VM_NAME}" --vcpus="${VCPUS}" --memory="${RAM_MB}" \
11        --os-variant="fedora-coreos-$STREAM" --import --graphics=vnc \
12        --disk="size=${DISK_GB},backing_store=${IMAGE}" \
13        --qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=${IGNITION_CONFIG}"

Basically straight out of the documentation, but I added the ‘–graphics=vnc’ option. That makes virt-installer issue a warning because virt-manager isn’t installed, but allows cockpit to access the VMs console. But for some reason the install command never completes, even though I can now access the VM over ssh. Everything is broken by default.

1[jan@h2fed fcosdeploy]$ ssh core@192.168.122.248
2
3Fedora CoreOS 34.20211031.3.0
4Tracker: https://github.com/coreos/fedora-coreos-tracker
5Discuss: https://discussion.fedoraproject.org/c/server/coreos/
6
7[core@localhost ~]$ 

Here a screenshot of cockpit-machines accessing the VM. Not very spectacular, but that’s the whole point.

screenshot of cockpit

That’s enough success for this installment. All these little things cost a lot of time. But at least now I know that I can run FCOS, and access the VMs with cockpit. In the next part , I’ll try automating this simple setup with Ansible.


Building a Kubernetes ML Cluster on SBCs - Part 0
Building a Kubernetes ML Cluster on SBCs - Part 2