roles/openshift_cfme/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404

# OpenShift-Ansible - CFME Role

# PROOF OF CONCEPT - Alpha Version

This role is based on the work in the upstream
[manageiq/manageiq-pods](https://github.com/ManageIQ/manageiq-pods)
project. For additional literature on configuration specific to
ManageIQ (optional post-installation tasks), visit the project's
[upstream documentation page](http://manageiq.org/docs/get-started/basic-configuration).

Please submit a
[new issue](https://github.com/openshift/openshift-ansible/issues/new)
if you run into bugs with this role or wish to request enhancements.

# Important Notes

This is an early *proof of concept* role to install the Cloud Forms
Management Engine (ManageIQ) on OpenShift Container Platform (OCP).

* This role is still in **ALPHA STATUS**
* Many options are hard-coded still (ex: NFS setup)
* Not many configurable options yet
* **Should** be ran on a dedicated cluster
* **Will not run** on undersized infra
* The terms *CFME* and *MIQ* / *ManageIQ* are interchangeable

## Requirements

**NOTE:** These requirements are copied from the upstream
[manageiq/manageiq-pods](https://github.com/ManageIQ/manageiq-pods)
project.

### Prerequisites:

*
  [OpenShift Origin 1.5](https://docs.openshift.com/container-platform/3.5/welcome/index.html)
  or
  [higher](https://docs.openshift.com/container-platform/latest/welcome/index.html)
  provisioned
* NFS or other compatible volume provider
* A cluster-admin user (created by role if required)

### Cluster Sizing

In order to avoid random deployment failures due to resource
starvation, we recommend a minimum cluster size for a **test**
environment.

| Type           | Size    | CPUs     | Memory   |
|----------------|---------|----------|----------|
| Masters        | `1+`    | `8`      | `12GB`   |
| Nodes          | `2+`    | `4`      | `8GB`    |
| PV Storage     | `25GB`  | `N/A`    | `N/A`    |


![Basic CFME Deployment](img/CFMEBasicDeployment.png)

**CFME has hard-requirements for memory. CFME will NOT install if your
  infrastructure does not meet or exceed the requirements given
  above. Do not run this playbook if you do not have the required
  memory, you will just waste your time.**


### Other sizing considerations

* Recommendations assume MIQ will be the **only application running**
  on this cluster.
* Alternatively, you can provision an infrastructure node to run
  registry/metrics/router/logging pods.
* Each MIQ application pod will consume at least `3GB` of RAM on initial
  deployment (blank deployment without providers).
* RAM consumption will ramp up higher depending on appliance use, once
  providers are added expect higher resource consumption.


### Assumptions

1) You meet/exceed the [cluster sizing](#cluster-sizing) requirements
1) Your NFS server is on your master host
1) Your PV backing NFS storage volume is mounted on `/exports/`

Required directories that NFS will export to back the PVs:

* `/exports/miq-pv0[123]`

If the required directories are not present at install-time, they will
be created using the recommended permissions per the
[upstream documentation](https://github.com/ManageIQ/manageiq-pods#make-persistent-volumes-to-host-the-miq-database-and-application-data):

* UID/GID: `root`/`root`
* Mode: `0775`

**IMPORTANT:** If you are using a separate volume (`/dev/vdX`) for NFS
  storage, **ensure** it is mounted on `/exports/` **before** running
  this role.


## Role Variables

Core variables in this role:

| Name                          | Default value | Description   |
|-------------------------------|---------------|---------------|
| `openshift_cfme_install_app`  | `False`       | `True`: Install everything and create a new CFME app, `False`: Just install all of the templates and scaffolding |


Variables you may override have defaults defined in
[defaults/main.yml](defaults/main.yml).


# Important Notes

This is a **tech preview** status role presently. Use it with the same
caution you would give any other pre-release software.

**Most importantly** follow this one rule: don't re-run the entrypoint
playbook multiple times in a row without cleaning up after previous
runs if some of the CFME steps have ran. This is a known
flake. Cleanup instructions are provided at the bottom of this README.


# Usage

This section describes the basic usage of this role. All parameters
will use their [default values](defaults/main.yml).

## Pre-flight Checks

**IMPORTANT:** As documented above in [the prerequisites](#prerequisites),
  you **must already** have your OCP cluster up and running.

**Optional:** The ManageIQ pod is fairly large (about 1.7 GB) so to
save some spin-up time post-deployment, you can begin pre-pulling the
docker image to each of your nodes now:

```
root@node0x # docker pull docker.io/manageiq/manageiq-pods:app-latest-fine
```

## Getting Started

1) The *entry point playbook* to install CFME is located in
[the BYO playbooks](../../playbooks/byo/openshift-cfme/config.yml)
directory

2) Update your existing `hosts` inventory file and ensure the
parameter `openshift_cfme_install_app` is set to `True` under the
`[OSEv3:vars]` block.

2) Using your existing `hosts` inventory file, run `ansible-playbook`
with the entry point playbook:

```
$ ansible-playbook -v -i <INVENTORY_FILE> playbooks/byo/openshift-cfme/config.yml
```

## Next Steps

Once complete, the playbook will let you know:


```
TASK [openshift_cfme : Status update] *********************************************************
ok: [ho.st.na.me] => {
    "msg": "CFME has been deployed. Note that there will be a delay before it is fully initialized.\n"
}
```

This will take several minutes (*possibly 10 or more*, depending on
your network connection). However, you can get some insight into the
deployment process during initialization.

### oc describe pod manageiq-0

*Some useful information about the output you will see if you run the
`oc describe pod manageiq-0` command*

**Readiness probe**s - These will take a while to become
`Healthy`. The initial health probes won't even happen for at least 8
minutes depending on how long it takes you to pull down the large
images. ManageIQ is a large application so it may take a considerable
amount of time for it to deploy and be marked as `Healthy`.

If you go to the node you know the application is running on (check
for `Successfully assigned manageiq-0 to <HOST|IP>` in the `describe`
output) you can run a `docker pull` command to monitor the progress of
the image pull:

```
[root@cfme-node ~]# docker pull docker.io/manageiq/manageiq-pods:app-latest-fine
Trying to pull repository docker.io/manageiq/manageiq-pods ...
sha256:6c055ca9d3c65cd694d6c0e28986b5239ba56bbdf0488cccdaa283d545258f8a: Pulling from docker.io/manageiq/manageiq-pods
Digest: sha256:6c055ca9d3c65cd694d6c0e28986b5239ba56bbdf0488cccdaa283d545258f8a
Status: Image is up to date for docker.io/manageiq/manageiq-pods:app-latest-fine
```

The example above demonstrates the case where the image has been
successfully pulled already.

If the image isn't completely pulled already then you will see
multiple progress bars detailing each image layer download status.


### rsh

*Useful inspection/progress monitoring techniques with the `oc rsh`
command.*


On your master node, switch to the `cfme` project (or whatever you
named it if you overrode the `openshift_cfme_project` variable) and
check on the pod states:

```
[root@cfme-master01 ~]# oc project cfme
Now using project "cfme" on server "https://10.10.0.100:8443".

[root@cfme-master01 ~]# oc get pod
NAME                 READY     STATUS    RESTARTS   AGE
manageiq-0           0/1       Running   0          14m
memcached-1-3lk7g    1/1       Running   0          14m
postgresql-1-12slb   1/1       Running   0          14m
```

Note how the `manageiq-0` pod says `0/1` under the **READY**
column. After some time (depending on your network connection) you'll
be able to `rsh` into the pod to find out more of what's happening in
real time. First, the easy-mode command, run this once `rsh` is
available and then watch until it says `Started Initialize Appliance
Database`:

```
[root@cfme-master01 ~]# oc rsh manageiq-0 journalctl -f -u appliance-initialize.service
```

For the full explanation of what this means, and more interactive
inspection techniques, keep reading on.

To obtain a shell on our `manageiq` pod we use this command:

```
[root@cfme-master01 ~]# oc rsh manageiq-0 bash -l
```

The `rsh` command opens a shell in your pod for you. In this case it's
the pod called `manageiq-0`. `systemd` is managing the services in
this pod so we can use the `list-units` command to see what is running
currently: `# systemctl list-units | grep appliance`.

If you see the `appliance-initialize` service running, this indicates
that basic setup is still in progress. We can monitor the process with
the `journalctl` command like so:


```
[root@manageiq-0 vmdb]# journalctl -f -u appliance-initialize.service
Jun 14 14:55:52 manageiq-0 appliance-initialize.sh[58]: == Checking deployment status ==
Jun 14 14:55:52 manageiq-0 appliance-initialize.sh[58]: No pre-existing EVM configuration found on region PV
Jun 14 14:55:52 manageiq-0 appliance-initialize.sh[58]: == Checking for existing data on server PV ==
Jun 14 14:55:52 manageiq-0 appliance-initialize.sh[58]: == Starting New Deployment ==
Jun 14 14:55:52 manageiq-0 appliance-initialize.sh[58]: == Applying memcached config ==
Jun 14 14:55:53 manageiq-0 appliance-initialize.sh[58]: == Initializing Appliance ==
Jun 14 14:55:57 manageiq-0 appliance-initialize.sh[58]: create encryption key
Jun 14 14:55:57 manageiq-0 appliance-initialize.sh[58]: configuring external database
Jun 14 14:55:57 manageiq-0 appliance-initialize.sh[58]: Checking for connections to the database...
Jun 14 14:56:09 manageiq-0 appliance-initialize.sh[58]: Create region starting
Jun 14 14:58:15 manageiq-0 appliance-initialize.sh[58]: Create region complete
Jun 14 14:58:15 manageiq-0 appliance-initialize.sh[58]: == Initializing PV data ==
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: == Initializing PV data backup ==
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: sending incremental file list
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: created directory /persistent/server-deploy/backup/backup_2017_06_14_145816
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/REGION
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/certs/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/certs/v2_key
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/config/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: region-data/var/www/miq/vmdb/config/database.yml
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/var/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/var/www/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/var/www/miq/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/var/www/miq/vmdb/
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: server-data/var/www/miq/vmdb/GUID
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: sent 1330 bytes  received 136 bytes  2932.00 bytes/sec
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: total size is 770  speedup is 0.53
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: == Restoring PV data symlinks ==
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: /var/www/miq/vmdb/REGION symlink is already in place, skipping
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: /var/www/miq/vmdb/config/database.yml symlink is already in place, skipping
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: /var/www/miq/vmdb/certs/v2_key symlink is already in place, skipping
Jun 14 14:58:16 manageiq-0 appliance-initialize.sh[58]: /var/www/miq/vmdb/log symlink is already in place, skipping
Jun 14 14:58:28 manageiq-0 systemctl[304]: Removed symlink /etc/systemd/system/multi-user.target.wants/appliance-initialize.service.
Jun 14 14:58:29 manageiq-0 systemd[1]: Started Initialize Appliance Database.
```

Most of what we see here (above) is the initial database seeding
process. This process isn't very quick, so be patient.

At the bottom of the log there is a special line from the `systemctl`
service, `Removed symlink
/etc/systemd/system/multi-user.target.wants/appliance-initialize.service`. The
`appliance-initialize` service is no longer marked as enabled. This
indicates that the base application initialization is complete now.

We're not done yet though, there are other ancillary services which
run in this pod to support the application. *Still in the rsh shell*,
Use the `ps` command to monitor for the `httpd` processes
starting. You will see output similar to the following when that stage
has completed:

```
[root@manageiq-0 vmdb]# ps aux | grep http
root       1941  0.0  0.1 249820  7640 ?        Ss   15:02   0:00 /usr/sbin/httpd -DFOREGROUND
apache     1942  0.0  0.0 250752  6012 ?        S    15:02   0:00 /usr/sbin/httpd -DFOREGROUND
apache     1943  0.0  0.0 250472  5952 ?        S    15:02   0:00 /usr/sbin/httpd -DFOREGROUND
apache     1944  0.0  0.0 250472  5916 ?        S    15:02   0:00 /usr/sbin/httpd -DFOREGROUND
apache     1945  0.0  0.0 250360  5764 ?        S    15:02   0:00 /usr/sbin/httpd -DFOREGROUND
```

Furthermore, you can find other related processes by just looking for
ones with `MIQ` in their name:

```
[root@manageiq-0 vmdb]# ps aux | grep miq
root        333 27.7  4.2 555884 315916 ?       Sl   14:58   3:59 MIQ Server
root       1976  0.6  4.0 507224 303740 ?       SNl  15:02   0:03 MIQ: MiqGenericWorker id: 1, queue: generic
root       1984  0.6  4.0 507224 304312 ?       SNl  15:02   0:03 MIQ: MiqGenericWorker id: 2, queue: generic
root       1992  0.9  4.0 508252 304888 ?       SNl  15:02   0:05 MIQ: MiqPriorityWorker id: 3, queue: generic
root       2000  0.7  4.0 510308 304696 ?       SNl  15:02   0:04 MIQ: MiqPriorityWorker id: 4, queue: generic
root       2008  1.2  4.0 514000 303612 ?       SNl  15:02   0:07 MIQ: MiqScheduleWorker id: 5
root       2026  0.2  4.0 517504 303644 ?       SNl  15:02   0:01 MIQ: MiqEventHandler id: 6, queue: ems
root       2036  0.2  4.0 518532 303768 ?       SNl  15:02   0:01 MIQ: MiqReportingWorker id: 7, queue: reporting
root       2044  0.2  4.0 519560 303812 ?       SNl  15:02   0:01 MIQ: MiqReportingWorker id: 8, queue: reporting
root       2059  0.2  4.0 528372 303956 ?       SNl  15:02   0:01 puma 3.3.0 (tcp://127.0.0.1:5000) [MIQ: Web Server Worker]
root       2067  0.9  4.0 529664 305716 ?       SNl  15:02   0:05 puma 3.3.0 (tcp://127.0.0.1:3000) [MIQ: Web Server Worker]
root       2075  0.2  4.0 529408 304056 ?       SNl  15:02   0:01 puma 3.3.0 (tcp://127.0.0.1:4000) [MIQ: Web Server Worker]
root       2329  0.0  0.0  10640   972 ?        S+   15:13   0:00 grep --color=auto -i miq
```

Finally, *still in the rsh shell*, to test if the application is
running correctly, we can request the application homepage. If the
page is available the page title will be `ManageIQ: Login`:

```
[root@manageiq-0 vmdb]# curl -s -k https://localhost | grep -A2 '<title>'
<title>
ManageIQ: Login
</title>
```

**Note:** The `-s` flag makes `curl` operations silent and the `-k`
flag to ignore errors about untrusted certificates.


# Additional Upstream Resources

Below are some useful resources from the upstream project
documentation. You may find these of value.

* [Verify Setup Was Successful](https://github.com/ManageIQ/manageiq-pods#verifying-the-setup-was-successful)
* [POD Access And Routes](https://github.com/ManageIQ/manageiq-pods#pod-access-and-routes)
* [Troubleshooting](https://github.com/ManageIQ/manageiq-pods#troubleshooting)


# Manual Cleanup

At this time uninstallation/cleanup is still a manual process. You
will have to follow a few steps to fully remove CFME from your
cluster.

Delete the project:

* `oc delete project cfme`

Delete the PVs:

* `oc delete pv miq-pv01`
* `oc delete pv miq-pv02`
* `oc delete pv miq-pv03`

Clean out the old PV data:

* `cd /exports/`
* `find miq* -type f -delete`
* `find miq* -type d -delete`

Remove the NFS exports:

* `rm /etc/exports.d/openshift_cfme.exports`
* `exportfs -ar`

Delete the user:

* `oc delete user cfme`

**NOTE:** The `oc delete project cfme` command will return quickly
however it will continue to operate in the background. Continue
running `oc get project` after you've completed the other steps to
monitor the pods and final project termination progress.