Skip to content
This repository has been archived by the owner on Nov 20, 2023. It is now read-only.

Libvirt provider - OpenShift installation (step 3/4) #354

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions docs/PROVISIONING_LIBVIRT.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
= OpenShift on Libvirt using CASL
:MYWORKDIR: ~/src
// FIXME: how to get variables rendered in code blocks?

== Introduction

The aim of this setup is to get a flexible OpenShift installation which as little intrusive on the host as possible, under the assumption that a Libvirt installation will mostly be used on laptops or workstations, which also need to continue working well for other purposes.

CAUTION: THIS DOCUMENT AND THE ARTEFACTS PERTAINING TO IT ARE STILL UNDER _HEAVY_ DEVELOPMENT!!!

== Control Host Setup (one time, only)

NOTE: These steps are a canned set of steps serving as an example, and may be different in your environment.

Before getting started following this guide, you'll need the following:

FIXME:: address docker installation and usage at a later stage.

* Docker installed
** RHEL/CentOS: `yum install -y docker`
** Fedora: `dnf install -y docker`
** **NOTE:** If you plan to run docker as yourself (non-root), your username must be added to the `docker` user group.

* Ansible 2.7 or later installed
** link:https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html[See Installation Guide]
* python3-libvirt and/or python2-libvirt

[source,bash]
----
cd {MYWORKDIR}/
git clone https://github.com/redhat-cop/casl-ansible.git
----

* Run `ansible-galaxy` to pull in the necessary requirements for the CASL provisioning of OpenShift:

NOTE: The target directory ( `galaxy` ) is **important** as the playbooks know to source roles and playbooks from that location.

[source,bash]
----
cd {MYWORKDIR}/casl-ansible
ansible-galaxy install -r casl-requirements.yml -p galaxy
----

== Libvirt setup

The following needs to be set up on your Libvirt server before you can start:

=== Setup a local dnsmasq

Create a dummy network interface:

------------------------------------------------------------------------
sudo modprobe dummy
sudo ip link add dummy0 type dummy
sudo ip address add 192.168.123.254 dev dummy0 # <1>
sudo ip address show dev dummy0 up
------------------------------------------------------------------------
<1> the IP-address must be the one you've entered as forwarder for the apps wildcard DNS in your network XML.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No explanation of what that Network XML should be or what it should contain?


Start dnsmasq against this interface, defining our wildcard DNS domain *.apps.local

------------------------------------------------------------------------
sudo dnsmasq --interface=dummy0 --no-daemon --log-queries=extra \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain that this is only required if you do not have an existing DNS server with records for your hosts?

--bind-interfaces --clear-on-reload \
--address=/apps.local/192.168.123.123 # <1>
------------------------------------------------------------------------
<1> the IP-address must be the one of the VM where the OCP route will be running, and the domain must of course be the one configured as apps wildcard.

NOTE: the dnsmasq is hence only running on-demand but as it's the case of my OpenShift cluster as well, no big deal.

CAUTION: I guess I had already opened the firewall accordingly and integrated beforehand Satellite 6 with my Libvirtd (e.g. `LIBVIRTD_ARGS="--listen"` in `/etc/sysconfig/libvirtd`, so there might be more than the above to it.

=== Create a separate network

Call `sudo virsh net-create --file libvirt-network-definition.xml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide an example for what this XML file should look like?


CAUTION: the network definition isn't persistent (on purpose for a start) and needs to be repeated before each start.

TODO:: continue description !!!

Cool! Now you're ready to provision OpenShift clusters on Libvirt.

== Provision an OpenShift Cluster

As an example, we'll provision the `sample.libvirt.example.com` cluster defined in the `{MYWORKDIR}/casl-ansible/inventory` directory.

NOTE: Unless you already have a working inventory, it is recommended that you make a copy of the above mentioned sample inventory and keep it somewhere outside of the casl-ansible directory. This allows you to update/remove/change your casl-ansible source directory without losing your inventory. Also note that it may take some effort to get the inventory just right, hence it is very beneficial to keep it around for future use without having to redo everything.

FIXME:: the instructions are written _for now_ step by step and running locally on the libvirt host as root. This might/should change in the future but this is the current state of the implementation. Each sub-chapter is called after the playbook step constituting the end-to-end playbook.


=== provision-instances

- make sure `/dev/loopN` isn't mounted on `/var/www/html/installXXX`, and remove it from your `/etc/fstab` if you try multiple times with errors (something to FIXME).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

- copy and adapt the sample directory with files and inventory:
* adapt the Libvirt specific parameters to make them compatible with your setup (especially the network)
- export the variable `LIBVIRT_INV_VM_FILTER` to fit the libvirt names defined for your cluster's VMs, e.g. `export LIBVIRT_INV_VM_FILTER=^ocp_`.
- if your network isn't persistent create it (see above).
- make sure that `/tmp/authorized_keys` exists. FIXME: not sure yet for which use cases it is required, I just copy for now my own authorized keys.
- call `ansible-playbook -i ../../inventory/sample.libvirt.example.com.d/inventory libvirt/provision.yml`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May need to add -b to the ansible-playbook command if your user does not have permissions to virsh on the target KVM host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you should be clear about which directory to run this command from or rewrite this step to be run from the root of the project.

+
IMPORTANT: virt-install is only running synchronously because a virt-viewer UI is popping up. Close each virt-viewer once the corresponding installation has happened and not too long after.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if your KVM host is remote?

+
- identify the IP address of the infrastructure VM on which the route will run and start accordingly the separate dnsmasq responsible to do the wildcard DNS resolution (see above).
- login into one of the new VMs and validate that DNS is working correctly:
* `dig master.local` gives the correct IP address (same for all VMs)
* `dig -x <master-ip>` works as well
* `dig -x xxx.apps.local` gives the IP of the route/infranode.

NOTE: up till now, I've worked as root to avoid complications. From here on, I'm working again as normal user on the control host.

=== pre-install

IMPORTANT: you need to have ssh-ed once to each node to make sure that their SSH-signature is already in your known_hosts file.

Things to consider:

- make sure the above preparations are still active (network, DNS, environment variables)
- define the environment variables `RHSM_USER` and `RHSM_PASSWD` or use an activation key (TODO describe activation key / Satellite 6 approach).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't these be defined in the inventory hosts file as variables?

+
CAUTION: because there is no trace of OpenShift on the system, it is relatively sure that auto-attach will fail. Hence make sure `rhsm_pool` or `rhsm_pool_ids` are defined in the inventory (or on the command line).

Then call `ansible-playbook -i ../../inventory/sample.libvirt.example.com.d/inventory/ pre-install.yml -e rhsm_pool='^{POOL_NAME}$'`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're running this from a different directory than the last ansible-playbook command? Perhaps default to running commands from the root of the project?


=== install

- make sure the credentials are set in the environment, either using your RHSM credentials or adding specific ones (OREG_AUTH_USER and OREG_AUTH_PASSWORD, see inventory OSEv3.yml for details).
- call `ansible-playbook -i ../../inventory/sample.libvirt.example.com.d/inventory/ install.yml` and wait...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From which directory do we run this?

- once the playbook has run successfully, you need to enter `master.local` into the `/etc/hosts` of your workstation (unless it's using the right DNS), and then you can point your browser to `https://master.local:8443` and login as admin (using the IP address directly won't work).


TODO:: continue to adapt / complete the following lines for Libvirt

Run the `end-to-end` provisioning playbook via our link:../images/casl-ansible/[??? installer container image].

[source,bash]
----
docker run -u `id -u` \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an AWS Access key/secret? Why are we running this from Docker when all other Ansible commands have been run without Docker?

-v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z \
-v $HOME/src/:/tmp/src:Z \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e INVENTORY_DIR=/tmp/src/casl-ansible/inventory/sample.libvirt.example.com.d/inventory \
-e PLAYBOOK_FILE=/tmp/src/casl-ansible/playbooks/openshift/end-to-end.yml \
-e OPTS="-e libvirt_key_name=my-key-name" -t \
quay.io/redhat-cop/casl-ansible
----

NOTE: The above bind-mounts will map files and source directories to the correct locations within the control host container. Update the local paths per your environment for a successful run.

NOTE: Depending on the SELinux configuration on your OS, you may or may not need the `:Z` at the end of the volume mounts.

Done! Wait till the provisioning completes and you should have an operational OpenShift cluster. If something fails along the way, either update your inventory and re-run the above `end-to-end.yml` playbook, or it may be better to [delete the cluster](https://github.com/redhat-cop/casl-ansible#deleting-a-cluster) and re-start.

== Updating a Cluster

Once provisioned, a cluster may be adjusted/reconfigured as needed by updating the inventory and re-running the `end-to-end.yml` playbook.

== Scaling Up and Down

A cluster's Infra and App nodes may be scaled up and down by editing the following parameters in the `all.yml` file and then re-running the `end-to-end.yml` playbook as shown above.

[source,yaml]
----
appnodes:
count: <REPLACE WITH NUMBER OF INSTANCES TO CREATE>
infranodes:
count: <REPLACE WITH NUMBER OF INSTANCES TO CREATE>
----

== Deleting a Cluster

A cluster can be decommissioned/deleted by re-using the same inventory with the `delete-cluster.yml` playbook found alongside the `end-to-end.yml` playbook.

[source,bash]
----
docker run -it -u `id -u` \
-v $HOME/.ssh/id_rsa:/opt/app-root/src/.ssh/id_rsa:Z \
-v $HOME/src/:/tmp/src:Z \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e INVENTORY_DIR=/tmp/src/casl-ansible/inventory/sample.casl.example.com.d/inventory \
-e PLAYBOOK_FILE=/tmp/src/casl-ansible/playbooks/openshift/delete-cluster.yml \
-e OPTS="-e libvirt_key_name=my-key-name" -t \
quay.io/redhat-cop/casl-ansible
----
102 changes: 102 additions & 0 deletions docs/TODO_LIBVIRT.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
= Improvements and corrections for Libvirt as infra

The following isn't a decision of things to do, just a rather random list of things that might (or might not) end in future version of the Libvirt integration. It is just easier to document those things as I work on the topic, rather than creating individual issues that might never be addressed.

== Make the VM provisioning purely local

It seems we can avoid spinning a web server to provide the ISO content by using a command line like the following:

------------------------------------------------------------------------
virt-install \
--name rhel7.5-Ceph-test-singlenode \
--os-variant rhel7 \
--initrd-inject=/home/pcfe/work/git/HouseNet/kickstart/RHEL75-Ceph-singlenode-ks.cfg \
--location /mnt/ISO_images/rhel-server-7.5-x86_64-dvd.iso \
--extra-args "ks=file:/RHEL75-Ceph-singlenode-ks.cfg console=ttyS0,115200" \
--ram 2048 \
--disk pool=default,boot_order=1,format=qcow2,bus=virtio,discard=unmap,sparse=yes,size=10 \
--disk pool=default,boot_order=2,format=qcow2,bus=virtio,discard=unmap,sparse=yes,size=5 \
--disk pool=default,boot_order=3,format=qcow2,bus=virtio,discard=unmap,sparse=yes,size=5 \
--disk pool=default,boot_order=4,format=qcow2,bus=virtio,discard=unmap,sparse=yes,size=5 \
--controller scsi,model=virtio-scsi \
--rng /dev/random \
--boot useserial=on \
--vcpus 1 \
--cpu host \
--nographics \
--accelerate \
--network network=default,model=virtio
------------------------------------------------------------------------

This would probably be a less intrusive and resource intensive approach.

== Make the dynamic inventory script more flexible

Could be useful to make it easier to add/remove nodes without configuring multiple places in the (static) inventory. Two ideas:

- either add metadata to the created VMs (but how does it work exactly?).
- or use the description field to pack information, e.g. using an ini or json format (quoting might be an issue here).

== Split the inventory in infra and cluster inventories

I'm thinking that it would be much more flexible and easier to maintain to split the inventory in a "cluster type" and an "infra type" and combine them with multiple -i options, .e.g -i libvirt_inv/ -i 3_nodes_cluster_inv/.

Just an idea at this stage and I'm not sure it's easily possible to get the expected flexiblity, but with the right dynamic script, it might be feasible.

== Make the Libvirt inventory more robust

. the inventory shouldn't fail if title or description is missing

== Improve the playbooks / roles using ideas from others

Following sources have been identified and could be used:

- https://github.com/nmajorov/libvirt-okd
- https://docs.google.com/document/d/1Mbd2v6j3AQlbiY_zbZF5fWWwXvenQf8EDw7oW5907Hs/edit?usp=drivesdk

== Improve the CONTRIBUTE_PROVISIONER.md

Just taking notes for future improvements as I learn myself through casl, feel free to already review, I'll add them to the already document once I'm finished here:

NOTE: `{provisioner}` and `{PROVISIONER}` stand for your provisioner (e.g. libvirt etc.) either in capitals or not.

- create following directories in the `casl-ansible` repository:
* `playbooks/openshift/{provisioner}`
* `inventory/sample.{provisioner}.example.com.d/inventory` (and optionally `files` or others)
- create a playbook which creates VMs from your provisioner as `playbooks/openshift/{provisioner}/provision.yml` and make sure it is called from `playbooks/openshift/provision-instances.yml` based on the variable `hosting_infrastructure` set to your provisioner.
* re-use as much as possible roles from the `infra-ansible` repo or add new generic roles that support your infrastructure provider independently from `casl-ansible`.
- create a sample inventory respecting following requirements:
* it respects the usual OpenShift inventory settings and makes sure that the nodes created during the provisioning phase are neatly put into the right groups.
* this most probably pre-requisites that there is a dynamic inventory script created, which pulls the information about the VMs from the provisioner. This script is created into `inventory/scripts/` and linked into `inventory/sample.{provisioner}.example.com.d/inventory` (FIXME: why this complexity?). Your script must especially make sure that `ansible_host` is defined so that the Ansible connection isn't relying on the name in the inventory, which is recommended to be independent from the provisioning.
* it defines following variables required during the next steps of the end-to-end process:
** `hosting_infrastructure` set to `{provisioner}`.
** `docker_storage_block_device` e.g. `/dev/vdb` (implying that each created VM has two disks, one being reserved for Docker).
- create a document `docs/PROVISIONING_{PROVISIONER}.adoc` explaining how to adapt and use your provisioner. Some notes about the considerations you've made during implementation is surely not a bad idea.

== Perhaps fix inventory to avoid unwanted actions

The installer executed 1 action on cloud-host.local and that doesn't seem right but no clue which one (it was marked OK). It would also be good to review the 22 OK-actions on localhost.

== Fix warnings and errors in oc adm diagnostics

Quite verbose and I don't have quite enough experience to decide which errors are relevant and which not:

- `E0121 12:41:32.229577 80499 helpers.go:134] Encountered config error json: unknown field "masterCount" in object *config.MasterConfig, raw JSON:`
- errors about AggregatedLogging, Kibana, etc...
- many notes/infos about extra permissions
- error about missing iptables:
+
------------------------------------------------------------------------
ERROR: [DS3002 from diagnostic UnitStatus@openshift/origin/pkg/oc/cli/admin/diagnostics/diagnostics/systemd/unit_status.go:59]
systemd unit atomic-openshift-node depends on unit iptables, which is not loaded.

iptables is used by nodes for container networking.
Connections to a container will fail without it.
An administrator probably needs to install the iptables unit with:

# yum install iptables

If it is already installed, you may need to reload the definition with:

# systemctl reload iptables
------------------------------------------------------------------------
37 changes: 37 additions & 0 deletions inventory/sample.libvirt.example.com.d/files/ks/appnode.ks
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
install
lang en_US.UTF-8
keyboard --vckeymap=us --xlayouts='us'
firstboot --enable
auth --enableshadow --passalgo=sha512
services --enabled=chronyd
eula --agreed
reboot

# network
network --bootproto=dhcp --device=eth0 --noipv6 --activate --hostname=appnode.local

# System timezone
timezone US/Eastern --isUtc

# Disks
bootloader --location=mbr --boot-drive=vda
ignoredisk --only-use=vda
zerombr
clearpart --all --initlabel --drives=vda
part /boot/efi --fstype="vfat" --size=200 --ondisk=vda
part /boot --fstype="ext2" --size=512 --ondisk=vda --asprimary
part pv.10 --fstype="lvmpv" --size=1 --grow --ondisk=vda

# LVMs
volgroup vg1 pv.10
logvol / --fstype=xfs --name=root --vgname=vg1 --size=1 --grow
logvol swap --fstype=swap --size=2048 --vgname=vg1

rootpw --plaintext redhat

%packages
@base
net-tools
wget

%end
37 changes: 37 additions & 0 deletions inventory/sample.libvirt.example.com.d/files/ks/infranode.ks
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
install
lang en_US.UTF-8
keyboard --vckeymap=us --xlayouts='us'
firstboot --enable
auth --enableshadow --passalgo=sha512
services --enabled=chronyd
eula --agreed
reboot

# network
network --bootproto=dhcp --device=eth0 --noipv6 --activate --hostname=infranode.local

# System timezone
timezone US/Eastern --isUtc

# Disks
bootloader --location=mbr --boot-drive=vda
ignoredisk --only-use=vda
zerombr
clearpart --all --initlabel --drives=vda
part /boot/efi --fstype="vfat" --size=200 --ondisk=vda
part /boot --fstype="ext2" --size=512 --ondisk=vda --asprimary
part pv.10 --fstype="lvmpv" --size=1 --grow --ondisk=vda

# LVMs
volgroup vg1 pv.10
logvol / --fstype=xfs --name=root --vgname=vg1 --size=1 --grow
logvol swap --fstype=swap --size=2048 --vgname=vg1

rootpw --plaintext redhat

%packages
@base
net-tools
wget

%end
Loading