path: root/docs/projects
diff options
Diffstat (limited to 'docs/projects')
1 files changed, 255 insertions, 0 deletions
diff --git a/docs/projects/katrindb.txt b/docs/projects/katrindb.txt
new file mode 100644
index 0000000..0a14a25
--- /dev/null
+++ b/docs/projects/katrindb.txt
@@ -0,0 +1,255 @@
+# Steps to setup KDB infrastructure in OpenShift
+Web interface:
+Commandline interface:
+oc login
+oc project katrin
+## Overview
+The setup uses (at least) three containers:
+* `kdb-backend` is a MySQL/MariaDB container that provides the database backend
+ used by KDB server. It hosts the `katrin` and `katrin_run` databases.
+* `kdb-server` runs the KDB server process inside an Apache environment. It
+ provides the web interface (`kdb-admin.fcgi`) and the KaLi service
+ (`kdb-kali.fcgi`).
+* `run-processing` periodically retrieves run files from several DAQ machines
+ and adds the processed files to the KDB runlist. This process could be
+ distributed over several containers for the individual systems (`fpd` etc.)
+> The ADEI server hosting the `adei` MySQL database runs in an independent project with hostname `mysql.adei.svc`.
+A persistent storage volume is needed for the MySQL data (volume group `db`)
+and for the copied/processed run files (volume group `katrin`). The latter one
+is shared between the KDB server and run processing applications.
+## MySQL backend
+### Application
+This container is based on the official Redhat MariaDB Docker image. The
+OpenShift application is created via the CLI:
+oc new-app -e MYSQL_ROOT_PASSWORD=XXX --name=kdb-backend
+Because KDB uses two databases (`katrin`, `katrin_run`) and must be permitted
+to create/edit database users, it is required to define a root password here.
+### Volumes
+This container needs a persistent storage volume for the database content. In
+OpenShift this is done by removing the default storage and adding a persistent
+volume `kdb-backend` for MySQL data: `db: /kdb/mysql/data -> /var/lib/mysql/data`
+### Final steps
+It makes sense to add readiness/liveness probes as well: TCP socket, port 3306.
+> It is possible to access the MySQL server inside a container: `mysql -h kdb-backend.katrin.svc -u root -p -A`
+## KDB server
+### Application
+The container is created from a `Dockerfile` available in GitLab:
+The app is created via the CLI, but manual changes are necessary later on:
+oc new-app --name=kdb-server
+> The build fails because the branch name and user credentials are not defined.
+The build settings must be adapted before the image can be created.
+* Set the git branch name to `kdbserver`.
+* Add a source secret `katrin-gitlab` that provides the git user credentials,
+ i.e. the `katrin` username and corresponding password for read-only access.
+When a container instance (pod) is created in OpenShift, the main script
+`/` starts the Apache webserver with the KDB fastcgi module.
+### Volumes
+Just like the MySQL backend, the container needs persistent storage enabled: `katrin: /data -> /mnt/katrin/data`
+### Config Maps
+Some default configuration files for the Apache web server and the KDB server
+installation are provided with the Dockerfile. The webserver config should
+work correctly as it is. The main config must be updated so that the correct
+servers/databases are used. A config map `kdbserver-config` is created with
+mountpoint `/config` in the container:
+* `kdbserver.conf` is the main config for the KDB server instance. For the
+ steps outlined here, it should contain the following entries:
+sql_server = kdb-backend.katrin.svc
+sql_adei_server = mysql.adei.svc
+sql_katrin_dbname = katrin
+sql_run_dbname = katrin_run
+sql_adei_dbname = adei_katrin
+sql_user = root
+sql_password = XXX
+sql_adei_user = katrin
+sql_adei_password = XXX
+use_adei_cache = true
+adei_service_url =
+adei_public_url =
+* `` defines the terminal/logfile output settings. By default,
+ all log output is shown on `stdout` (and visible in the OpenShift log).
+> Files in `/config` are symlinked to the respective files inside the container by `/`.
+### Database setup
+The KDB server sources provide a SQL dump file to initialize the database. To
+create an empty database with all necessary tables, run the `mysql` command:
+mysql -h kdb-backend.katrin.svc -u root -p < /src/kdbserver/Data/katrin-db.sql
+Alternatively, a full backup of the existing database can be imported:
+tar -xJf /src/kdbserver/Data/katrin-db-bkp.sql.xz -C /tmp
+mysql -h kdb-backend.katrin.svc -u root -p < /tmp/katrin-db-bkp.sql
+> To clean a database table, execute a MySQL `drop table` statement and re-initialize the dropped tables from the `katrin-db.sql` file.
+### IDLE storage
+IDLE provides a local storage on the server-side file system. An empty IDLE
+repository with default datasets is created by executing this command:
+/opt/kasper/bin/idle SetupPublicDatasets
+This creates a directory `.../storage/idle/KatrinIdle` on the storage volume
+that can be filled with contents from a backup archive. The `oc rsync` command
+allows to transfer files to a running container (pod) in OpenShift.
+> After restoring one should fix all permissions so that KDB can access the data.
+### Final steps
+Again a readiness/liveness probe can be added: TCP socket, port 80.
+To make the KDB server interface accessible to the outside, a route must be
+added in OpenShift: ` -> kdb-server:80`
+> The web interface is now available at
+## Run processing
+### Application
+The setup for the run processing service is similar to the KDB server, with
+the container being created from a GitLab `Dockerfile` as well:
+The app is created via the CLI, but manual changes are necessary later on:
+oc new-app --name=run-processing
+> The build fails because the branch name and user credentials are not defined.
+The build settings must be adapted before the image can be created.
+* Set the git branch name to `inlineprocessing`.
+* Use the source secret `katrin-gitlab` that was created before.
+#### Run environment
+When a container instance (pod) is created in OpenShift, the main script
+`/` starts the main processing script ``. It
+is executed in a continuous loop with a user-defined delay. The script
+is configured by the following environment variables that can be defined
+in the OpenShift configuration:
+* `PROCESS_SYSTEMS` defines one or more DAQ systems configured in the file
+ ``: `fpd`, `mos`, etc.
+* `PROCESS_FLAGS` defines additional options passed to the script, e.g.
+ `--pull` to automatically retrieve run files from configured DAQ machines.
+* `REFRESH_INTERVAL` defines the waiting time between consecutive executions.
+ Note that the `/` script waits until `` finished
+ before the next loop iteration is started, so the delay time is always
+ included regardless of how long the script takes to process all files.
+### Volumes
+The run processing stores files that need to be accessible by the KDB server
+application. Hence, the same persistent volume is used in this container:
+`katrin: data -> /mnt/katrin/data`
+To ensure that all processes can read/write correctly, the file permissions are
+relaxed (this can be done in an OpenShift terminal or remote shell):
+mkdir -p /mnt/katrin/data/{inbox,archive,storage,workspace,logs,tmp}
+chown -R katrin: /mnt/katrin/data
+chmod -R ug+rw /mnt/katrin/data
+### Config Maps
+Just like with the KDB server, a config map `run-processing-config` with
+mountpoint `/config` should be added, which defines the configuration of the
+processing script:
+* `` is the main config where the DAQ machines are defined
+ with their respective storage paths. The file also defines a list of
+ processing steps to be executed for each run file; these steps may have
+ to be adapted where necessary.
+* `datamanager.cfg` defines the interface to the KaLi web service. It must be
+ configured so that the KDB server instance from above is used:
+url = http://kdb-server.katrin.svc/kdb-kali.fcgi
+user = katrin
+password = XXX
+timeout_seconds = 300
+cache_age_hours = -1
+* `rsync-filter` is applied with the `rsync` command that copies run files
+ from the DAQ machines. It can be adapted to exclude certain directories,
+ e.g. old run files that do not need to be processed.
+* `` configures terminal/logfile output, see above.
+> Files in `/config` are symlinked to the respective files inside the container by `/`.
+#### SSH keys
+A second config map `run-processing-ssh` is required to provide SSH keys that
+are used to authenticate remote connections to the DAQ machines. The map with
+mountpoint `/.ssh` should contain the files `id_dsa`, `` and
+`known_hosts` and must be adapted as necessary.
+> This assumes that the SSH credentials have been added to the respective machines beforehand!
+> The contents of `known_hosts` should be updated with the output of `ssh-keyscan` for the configured DAQ machines.
+### Notes
+The script `/` pulls files from the DAQ machines and processes
+them automatically, newest first. Where necessary, run files can be copied
+manually (FPD example; adapt the options and `rsync-filter` file as required):
+rsync -rltD --verbose --append-verify --partial --stats --compare-dest=/mnt/katrin/data/archive/FPDComm_530 --filter='. /opt/processing/system/rsync-filter' --log-file='/mnt/katrin/data/logs/rsync_fpd.log' katrin@ /mnt/katrin/data/inbox/FPDComm_530
+If runs were not processed correctly, one can trigger manual reprocessing
+from an OpenShift terminal (with run numbers `START`, `END` as necessary):
+./ -s fpd -r START END