Building a Kubernetes ML Cluster on SBCs - Part 2

As I mentioned in part 1 of this series, I want to automate the deployment of the FCOS control plane machines with Ansible. Actually, I want to automate the entire Kubernetes deployment, but that is a future project. So lets get started.

Ansible and KVM

There are a number of Ansible modules for KVM/libvirt/qemu available, there are even complete playbooks (slightly out of date) and roles that do exactly what I am trying do to here. So if you are looking for a simple and complete solution, GIYF. Personally, I prefer building these things myself, using others work as reference only. Simply because I find that once I want to expand on it, it really helps to know exactly what is happening. Which is the entire reason for building this cluster in the first place.

Prerequisites

The first thing our playbook needs to do, is to ensure the prerequisites are matched, so lets first install the required packages:

 1- name: Install stuff
 2  hosts: controlserver
 3  become: true # I always set become=false in the ansible.cfg
 4  vars:
 5    packages:
 6        - '@virtualization' # there is a group for kvm, virtsh, etc
 7        - python3-libvirt # for virt ansible module
 8        - butane
 9    services:
10        - libvirtd
11  tasks:
12    - name: Ensure required packages are present
13      dnf:
14        name: "{{ packages }}"
15        state: present
16
17    - name: Start and enable services
18      systemd:
19        name: "{{ item }}"
20        state: started
21        enabled: true
22      loop: "{{ services }}" # only one service for now
23

Super simple stuff, but we need to start somewhere. Now, I realise I could (and probably should) create a role for this, but as much as I like Ansible roles, I find it better to pack everything into a playbook first. Later then, when I find I need to reuse or share things, I’ll transform it into a role.

Getting the CoreOS image

Before we can download the image, we need to know what version to use. I will grab the latest version. Which one is that? Well, this is DevOps, so there is a JSON for that: https://builds.coreos.fedoraproject.org/streams/stable.json Let’s grab the json and get the lastest version number and URL. Basically we are replicating the coreos-installer command. Because we can.

 1# tasks/getversion.yaml
 2- name: Get releases json # we could set the stream (stable/testing) here
 3  uri:
 4    url: https://builds.coreos.fedoraproject.org/streams/stable.json
 5  register: releases
 6
 7- name: Get latest version
 8  set_fact:
 9    fcos_latest: "{{ releases.json.architectures.x86_64.artifacts.qemu.release }}"
10    fcos_latest_url: "{{ releases.json.architectures.x86_64.artifacts.qemu.formats['qcow2.xz'].disk.location }}"

Yes, there is an architecture in there, so I could template that. But my Raspberries will have to run on bare metal anyway, so we won’t reuse this playbook there.

A small hint regarding the URI module: If we don’t use ‘return_content: true’ we still get the JSON, just under variable.json instead of under variable.content. Plus it’s in valid JSON, so much easier to use. I had huge hassle getting at the data under .content, with .json it just works.

Now we need to download the image, unpack it, etc. Just like we did using coreos-installer in part 1 , but now with a playbook. I am using Nick Curry’s excellent role as a template here, but of course adapting things to my personal needs.

 1# tasks/download.yaml
 2- name: Ensure directory is in place
 3  file:
 4    state: directory
 5    path: "{{ domain_path }}"
 6    mode: '1777' # yes, globally read/writeable
 7    setype: virt_image_t
 8
 9- name: Check image
10  stat:
11    path: "{{ domain_path }}/{{ image_name }}"
12  register: image_status 
13  
14- name: Block to download and unpack image
15  when: not image_status.stat.exists
16  block:
17    - name: Download qcow2 image
18      get_url:
19        url: "{{ fcos_latest_url }}"
20        dest: "{{ domain_path }}"
21
22    - name: Uncompress the image
23      command:  # ansible can archive xz, but not unarchive it. WTF?
24        cmd: "unxz {{ image_name }}.xz"
25        chdir: "{{ domain_path }}"

No funny tricks here, except that the variables are passed from the main playbook. If we do an ls on the domain_path, we now see the image in all its unpacked glory:

1[jan@h2fed ~]$ ls -l /opt/kvm/fcos
2total 1536724
3-rw-r--r--. 1 root root 1573715968 Nov 13 21:16 fedora-coreos-34.20211031.3.0-qemu.x86_64.qcow2

Ignition!

(Markdown has no sense for the dramatic. The linter complains about trailing punctuation in the heading) Now for the fun (and currently simple) part: We need an ignition file. For now we’ll just use the simple one from part 1 . I’ll actually use the butane command for this, on the remote host, and copy the file as a template. I realize that I can simple convert the yaml to json with Ansible, but I want to make sure the .ign file is definitely correct.

 1- name: Copy ignition template
 2  template:
 3    src: templates/simple.yaml
 4    dest: "{{ domain_path }}/simple.yaml"
 5    
 6- name: Run butane
 7  command: 
 8    cmd: butane --pretty --strict simple.yaml -o simple.ign
 9    chdir: "{{ domain_path }}"
10  become: false  # no reason to run as root

Enough for this part. Next we will get to the fiddly issue of running VMs with qemu/libvirt/kvm, so I need a lot of coffee first. All the files created here can be found in my GitHub repository in case you really want use them.


Building a Kubernetes ML Cluster on SBCs - Part 1