Openshift 4

As all of you know, we are in the process of migrating to OpenShift 4. Here is an update on the progress and some details about the changes you have to expect.

Note that most documentation changes will be delayed until the end of the migration. Meanwhile, refer to this document to inform yourself about the differences.

Migration plan and Schedule

Schedule

With support of OpenShift 3 expiring, we are on a tight schedule.

Due date Description
2022-05-03   Finish Migrating all test systems
2022-05-05 Start prod migrations
2022-05 Migrate address service
2022-05 Migrate thumbnail service
2022-05 Migrate commit info service (Jira integration)
2022-06 Migrate Sonar
2022-06 Migrate manual
2022-06-30 Prod migrations completed

Migration Details

There is two migration scenarios:

a) When we control all DNS records:

This is the case for all test systems and numerous production systems.

In this scenario, the migration path is straight forward:

  • Set up installation on OpenShift 4
  • Copy TLS certificate
  • Deploy and start Nice on new platform
  • Adjust DNS and wait for TTL to expire (1h)
  • Stop installation on OpenShift 3
  • Enable renewal of TLS certificates via ACME

During the migration, requests are spread between OpenShift 3 and OpenShift 4, on both of which Nice is running.

No downtime is expected and the full migration is completed within about an hour.

b) When the customer controls DNS records:

Migration path here is a bit more involved:

  • Set up installation on OpenShift 4
  • Forward traffic to /.well-known/acme-challenge/ from OS3 to OS4 employing a reverse proxy
  • Issue TLS certificates via ACME on OpenShift 4
  • Deploy and start Nice on new platform
  • Forward all traffic from OS3 to OS4
  • Wait for customer to update DNS (may take days or weeks)
  • Remove reverse proxy on OS3

All traffic from OpenShift 3 is forwarded to OpenShift 4 to give the customer time to adjust the DNS records.

Migration takes as long as the customer needs to adjust the DNS records. Here too, no downtime is expected and deployments are expected to be unavailable for about an hour.

Accessing OpenShift 4

Terminal

Login:

oc login -u <username> https://api.c-tocco-ocp4.tocco.ch:6443

Note that the toco- prefix has been dropped on OpenShift 4. That is, the project behind master is now called nice-master rather than toco-nice-master:

oc project nice-master

On OpenShift 4, some of you have limited access to nodes. See Nodes / Resources. Those with access can also fetch resources across all namespaces using --all-namespaces.

List pods in all namespaces:

oc get pods --all-namespaces

List resources usage of all pods in all namespaces:

kubectl top pods --all-namespaces --sort-by cpu

Or list all pods associated with a failed deployment in all namespaces:

oc get pods --all-namespaces --field-selector 'status.phase==Failed' -o custom-columns='Namespace:metadata.namespace,Pod Name:metadata.name,Reason:status.reason'

OpenShift Web Console

Web Console is available at https://console.apps.openshift.tocco.ch.

On OpenShift 4, some of you have limited access to nodes. See Nodes / Resources.

Changes in Ansible

In order to support OpenShift 4, many changes have been made to Ansible and TeamCity. This changes are at a lower level of abstraction and, thus, are transparent to users of Ansible.

The one and only change required to run an installation on OpenShift 4 is an explicit location:

location: cloudscale-os4

Once everything is moved, cloudscale-os4 will be made the default and removed again from the installations’ configurations.

Let me also point out another change, which isn’t specific to OpenShift 4. Some of you used to change the DOCKER_PULL_URL parameter in TeamCity manually. This parameter is managed by Ansible now to ensure the image is fetched from the right platform. With the new naming scheme, there is no need to adjust this manually anymore. With these scheme, the Docker image for a production deployment is fetched from <installation_name>test, if it exists, unconditionally.

Nodes / Resources

Those of you with admin access (same people that have root access) can now access node details.

List nodes:

$ oc get nodes
NAME          STATUS   ROLES          AGE     VERSION
infra-a5b4    Ready    infra,worker   35d     v1.22.5+5c84e52
infra-c235    Ready    infra,worker   35d     v1.22.5+5c84e52
infra-fc11    Ready    infra,worker   35d     v1.22.5+5c84e52
master-c946   Ready    master         35d     v1.22.5+5c84e52
master-d7ca   Ready    master         35d     v1.22.5+5c84e52
master-fb50   Ready    master         35d     v1.22.5+5c84e52
worker-0188   Ready    app,worker     6d17h   v1.22.5+5c84e52
worker-565d   Ready    app,worker     35d     v1.22.5+5c84e52
worker-61aa   Ready    app,worker     6d17h   v1.22.5+5c84e52

The nodes prefixed with worker- are the ones that run instances of Nice.

Show resource consumption:

$ kubectl top nodes
NAME          CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
infra-a5b4    1275m        17%    15970Mi         51%
infra-c235    584m         7%     11260Mi         36%
infra-fc11    826m         11%    16377Mi         52%
master-c946   1051m        30%    10937Mi         73%
master-d7ca   795m         22%    10103Mi         67%
master-fb50   658m         18%    8345Mi          56%
worker-0188   2257m        34%    32650Mi         51%
worker-565d   1134m        17%    17556Mi         27%
worker-61aa   1687m        25%    18507Mi         29%

Show resource requests and limits:

$ oc describe node worker-0188
…
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests       Limits
  --------           --------       ------
  cpu                3374m (51%)    200m (3%)
  memory             35298Mi (55%)  73964Mi (116%)
  ephemeral-storage  0 (0%)         0 (0%)
  hugepages-2Mi      0 (0%)         0 (0%)
…

Resource utilization is also available, in the form of graphs, in the Web Console:

CPU/Memory Graphs

Logging

Kibana can be accessed at https://kibana-openshift-logging.apps.openshift.tocco.ch.

Main difference is that search is no longer segregated by project/namespace. Filter by kubernetes.namespace_name to search logs of a specific installation:

Kibana

As result of this change, it’s now possible to list or visualize log messages across all or a selected number of installations.

DNS

Installations on OpenShift 4 require different DNS records.

Type A Records

This (OpenShift 3):

example.net.     3600 IN A      5.102.151.2
example.net.     3600 IN A      5.102.151.3

becomes (OpenShift 4):

example.net.     IN A     5.102.151.37

Type CNAME/ANAME/ALIAS Records

This (OpenShift 3):

extranet.example.net   IN CNAME   ha-proxy.tocco.ch.

becomes (OpenShift 4):

extranet.example.net   IN CNAME   os4.tocco.ch.

See DNS section in Tocco Docs for details.

Projects / Namespaces

OpenShift Projects, which are built on top of Kubernetes Namespaces, can now be created via the Kubernetes API.

Create a project:

oc new-project <project_name>

This creates a fresh project and grants you access to it. In order to grant everyone else access, it’s recommended to grant access to groups tocco-admin and tocco-dev:

Grant access to tocco-admin:

oc create -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: tocco-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: tocco-admin
EOF

Grant access to tocco-dev:

oc create -f - <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: tocco-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: tocco-admin
EOF

Whenever Ansible is used to manage a service, it needs access too:

oc create -f - << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ansible-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: admin
subjects:
- kind: ServiceAccount
  name: ansible
  namespace: serviceaccounts
EOF

Similarly, remove a project like this:

oc delete project <project_name>

Monitoring

For the time being monitoring stays at https://monitoring.vshn.net, and merely the label has been adjusted to allow distinguishing between OpenShift 3 and 4 easily:

Monitoring label

Long term, the plan is to switch to Prometheus for monitoring which will also allow to monitor metrics like memory usage, queue sizes or thread pool usage.

Persistent Volumes

Persistent volumes are used to store data, persistently, and are made available to pods via filesystem. On OpenShift 3, a Gluster-based storage was used which added some additional, unwanted complexity. On OpenShift 4, volumes are directly obtained from the storage provided by Cloudscale. This is potentially faster and more reliable.

Single Writer

The drawback of this setup is that only so called ReadWriteOnce storage is supported. That is, any number of users can read a volume but only one can have write access. On OpenShift 3, multiple, concurrent writers could exist.

Use of volumes within Nice is currently very limited:

  • Some legacy web sites store resources on the filesystem.
  • LMS module, before 3.0, stored e-tests on the filesystem.

Any user of a volume will no longer be able to run multiple instances. Consequently, rolling deployments can no longer be used with such installations. During a rolling deployment, a new pod is started and verified to be online before even attempting to shut down any old pods. This would lead to concurrent write access. Hence, such installations will have to use a different deployment strategy, namely, recreate. During a recreate deployment, the installation is stopped first, schema changes are applied, and only then is the installation started again.

It’s to be noted that new installations are not affected by this. Currently, I expect that about five Two customers will have to use this strategy until their installations can be updated.

EDIT:

On second thought, only customers with the LMS module are affected. Namely, iffp, sfb and spi whereof sfb isn’t running on our infrastructure at all. The aforementioned volumes for web resources are unaffected; those can be stored on read-only volumes.

This will lead to downtime during code and configuration deployments. First measurements indicate that simple configuration changes cause < 2 minutes downtime and minor schema upgrades < 4 minutes.

See also Using deployment strategies in the OpenShift 4 documentation.

Storage Classes

There are also advantages. In addition to the previously mentioned reduction in complexity, the storage is much cheaper, as no additional Gluster service is needed, and we can pick between SSD and even cheaper bulk (HDD) storage:

Request SSD volume:

oc set volume dc/nice -c nice --add --claim-class=sdd --name=lms --claim-name=lms --claim-size=10Gi --mount-path=/app/var/lms

Default used when --claim-class is omitted.

Request HDD volume:

oc set volume dc/nice -c nice --add --claim-class=bulk --name=lms --claim-name=lms --claim-size=10Gi --mount-path=/app/var/lms

Memory / Heap Dumps

In order preserve heap dumps across an application crash, a persistent volume was used. With OpenShift 4, /app/var/heap_dumps/, where heap dumps go, has been converted to an emptyDir volume. Such volumes are ephemeral and bound to a single pod. Yet, importantly, such volumes will survive an application crash and restart.

This means, in order to enable automatic memory dump during OOM, this is now sufficient:

oc set env dc/nice NICE2_DUMP_ON_OOM=true

There is no need to create a volume for automatic or manual dumps.

Note that, while emptyDir volumes are preserved across restarts, they will vanish together with the pod. Do not delete the pod or stop it by scaling down.

Ingress and ACME

On OpenShift 3, we used routes:

$ oc get route
NAME                      HOST/PORT                    PATH   SERVICES   PORT     TERMINATION     WILDCARD
nice                      master.tocco.ch ... 1 more          nice       80-tcp   edge/Redirect   None
nice-tocco.bitserver.ch   tocco.bitserver.ch                  nice       80-tcp   edge/Redirect   None

On OpenShift 4, ingresses are used instead:

$ oc get ingress
NAME                      CLASS    HOSTS                ADDRESS                                  PORTS     AGE
nice                      <none>   master.tocco.ch      router-default.apps.openshift.tocco.ch   80, 443   9d
nice-tocco.bitserver.ch   <none>   tocco.bitserver.ch   router-default.apps.openshift.tocco.ch   80, 443   9d

Routes are OpenShift-specific while ingress is what native Kubernetes uses. The reason we switch is that the new ACME integration only supports ingress. ACME is the protocol used by Let’s Encrypt, and others, to fully automate TLS certificate issuance.

In the background, a route is created automatically for every ingress:

$ oc get route
NAME                            HOST/PORT            PATH   SERVICES   PORT     TERMINATION     WILDCARD
nice-7v864                      master.tocco.ch      /      nice       80-tcp   edge/Redirect   None
nice-tocco.bitserver.ch-zn4hv   tocco.bitserver.ch   /      nice       80-tcp   edge/Redirect   None

However, using ingress directly is preferred.

Enabling ACME also differs slightly. Namely, a different annotation needs to be set to enable it:

oc annotate ingress/<name> cert-manager.io/cluster-issuer=letsencrypt-production

Of course, Ansible still does this automatically for Nice installations.

I do not yet have any experience troubleshooting the new ACME integration. No failure to issue a certificates has occurred yet. If needed, in addition the Troubleshooting section on Tocco Docs, you may want to check the Troubleshooting guide upstream.

On a side note, other object too are based on native Kubernetes objects. For instance, OpenShift’s Project is an extension of Kubernetes’ Namespace and DeploymentConfig of Deployment. As general rule, the native Kubernetes object should be preferred when none of the features of the OpenShift object is needed. The reason for this is that we want to keep open the possibility of switching to alternatives, like SUSE’s Ranger, in the future. Keeping as close to native Kubernetes will ease any such transition considerably.