From 44cef2cb16dd2bc55ad34d0b8313f7f314b0107a Mon Sep 17 00:00:00 2001
From: "Suren A. Chilingaryan" <csa@suren.me>
Date: Mon, 27 Jan 2020 05:30:53 +0100
Subject: Various docs about UFO, ROOF, and further plans

---
 docs/architecture.txt | 18 +++++++++++++++++
 docs/hardware.txt     |  6 ++++++
 docs/network.txt      | 28 ++++++++++++++++++++++++++
 docs/todo.txt         | 28 ++++++++++++++++++++++++++
 docs/ufo.txt          | 36 +++++++++++++++++++++++++++++++++
 src/DEVELOPMENT       | 56 ---------------------------------------------------
 6 files changed, 116 insertions(+), 56 deletions(-)
 create mode 100644 docs/architecture.txt
 create mode 100644 docs/hardware.txt
 create mode 100644 docs/network.txt
 create mode 100644 docs/todo.txt
 create mode 100644 docs/ufo.txt
 delete mode 100644 src/DEVELOPMENT

diff --git a/docs/architecture.txt b/docs/architecture.txt
new file mode 100644
index 0000000..2b00ab3
--- /dev/null
+++ b/docs/architecture.txt
@@ -0,0 +1,18 @@
+Configuration
+=============
+ x Pre-configured values in C-sources or shall everything be specified in the configuration? [ full-config, less prone to errors if all in one place ]
+    - Default header size can be computed using defined 'struct', but thats it.
+    - Do not try to compute maximum packet size, etc.
+ x Drop all computable parameters from the config [ with a few exceptions ]
+    - Should be possible to have network receiver without configuring the rest of the ROOF 
+    - Should be possible to keep both dataset size in network config (to avoid rewriting) and the ROOF hardware configuration, but they should match
+    - n_streams vs. n_modules: having multiple streams per module in future
+    - samples_per_rotations vs. sample_rate / image_rate: Don't know what happens with Switch rate control, etc.?
+ x How precise we should verify configuration consistency? [ implement JSON schema at some point ]
+ x Propogate broken frames by default? Or just drop marking the missing frames with metadata. [ ingest when necessary ]
+    - I.e. provide filter removing the broken frame or the one generating when necessary (to ingest the uninterrupted flow in the standard UFO filters)?
+    - How to handle partially broken frames?
+ x How to handle data planes? [ metadata passes trough processors, but not reductors ]
+    - Metadata seems preserved while passing trough standard UFO filters. So, this is easiest way out.
+    - We can build a stand-alone subgraph for each plane, but this likely involves a full data copy.
+    - This probably OK for simple use case: "raw storage + visualization" as only two instances, but for could be too challenging for multi-plane analysis.
diff --git a/docs/hardware.txt b/docs/hardware.txt
new file mode 100644
index 0000000..a293887
--- /dev/null
+++ b/docs/hardware.txt
@@ -0,0 +1,6 @@
+ - Jumbo frames are not currently supported, max packet size is 1500 bytes. 
+    * The maximum number of samples per packet can be computed as 
+	n = (1500 - header_size) / sample_size (pixels_per_module * bpp)		i.e. 46 = | 1492 / 32 |
+    * With 46 packets, however, we can't split a full rotation in a whole number of packets.
+    So, we need to find maximal number m, so that 
+	    (m <= n) and (samples_per_rotation % m = 0)					i.e. 40
diff --git a/docs/network.txt b/docs/network.txt
new file mode 100644
index 0000000..e7e2a34
--- /dev/null
+++ b/docs/network.txt
@@ -0,0 +1,28 @@
+Problems
+========
+ - When streaming at high speed (~ 16 data streams; 600 Mbit & 600 kpck each), the data streams quickly get
+ desynchronized (but all packets are delivered).
+    * It is unclear if problem is on the receiver side (no overloaded CPU cores) or de-synchronization is first
+    appear on the simmulation sender. The test with real hardware is required.
+    * For border case scenarios, increasing number of buffers from 2 to 10-20 helps. But at full speed, even 1000s
+    buffers are not enough. Packets counts are quickly going appart.
+    * Further increase of packet buffer provided to 'recvmmsg' does not help (even if blocking is enforced until 
+    all packets are received)
+    * At the speed specified above, the system works also without libvma.
+    * Actually, with libvma a larger buffer is required. In the beginning the performance of libvma is gradually 
+    speeding up (that was always like that). And during this period a significant desynchronization happens. To
+    compensate it, we need about 400 buffers with libvma as compared to only 10 required if standard Linux 
+    networking is utilized.
+ - In any case (LibVMA or not), some packets will be lost in the beginning if high-speed communication is tested.
+    * Usually, first packets are transferred OK, but, then, a few packets will be lost occasionally here and there
+    (resulting in broken frames). This basically breaks grabbing a few packets and exitig. Unclear if server- or 
+    client-side problem (makes sense to see how real-hardware will behave).
+    * Can we pre-heat to avoid this speeding-up problem (increase pre-allocated buffers, disable power-saving 
+    mode, ??) Or it will be also not a problem with hardware? We can send UDP packets (should be send from another
+    host), but packets are still lost:
+        for i in $(seq 4000 4015); do echo "data" > /dev/udp/192.168.34.84/$i; done
+    * The following doesn't help: new version of libvma, tunning of the options.
+ - Communication breaks with small MTU sizes (bellow 1500), but this is probably not important (Packets are 
+ delivered but with extreme latencies. Probably some tunning of network stack is required).
+ - Technically, everything should work if we start UFO server when data is already streamed. However, the first
+ dataset could be any. Therefore, the check fails as the data is shifted by a random number of datasets.
diff --git a/docs/todo.txt b/docs/todo.txt
new file mode 100644
index 0000000..8497a69
--- /dev/null
+++ b/docs/todo.txt
@@ -0,0 +1,28 @@
+Main
+====
+ + Add plane/frame-number/broken metadata in the UFO buffers. Check propagation trough standard ufo filters. [ propogates trough processors, but not reductors ]
+ + Plane selector filter
+ - Handle packets with data from multiple datasets
+ - Filter to ingest zero-padded broken frames.
+ - Try UFO 'flat-field' correction filter
+ - Cone-beam to parallel-beam resampling ?
+ - Full reconstruction chain
+ - Try UFO visualization filter
+ - "Reconstructed data storage" and "Visualization + raw data storage" modes
+
+Optional
+========
+ - Try online compression of the ROOF data @Gregoire
+ - Introduce JSON schema and JSON schema validator @Zhassulan
+ - If neceessary, reconstruct slightly broken frames (currently dropped) @Zhassulan
+ - Implement fast-writer filter for quick raw data storage
+    * Either include plane in the file name or pass invalid frames trough (so we can compute the frame number)? 
+ ? Support multi-plane configuration in the roof.py
+
+Ufo bugs
+========
+ - Complain if TIF-writer is used with 1D data.
+
+Ufo features
+============
+ ? Support non-fp data in UFO writer
diff --git a/docs/ufo.txt b/docs/ufo.txt
new file mode 100644
index 0000000..64ffb13
--- /dev/null
+++ b/docs/ufo.txt
@@ -0,0 +1,36 @@
+ROOF on UFO
+===========
+ - Current implementation follows UFO architecture: reader and dataset-builder are split in two filters.
+  * The reader is multi-threaded. However, only a single instance of the builder is possible to schedule.
+  This could limit maximum throughput on dual-head or even signle-head, but many-core systems.
+  * Another problem here is timing. All events in the builder are initiaded from the reader. Consequently,
+  as it seems we can't timeout on semi-complete dataset if no new data is arriving.
+  * Besides, performance this is also critical for stability. With continuous streaming there is no problem,
+  however, if a finite number of frames requested and some packets are lost, the software will wait forever
+  for missing bits.
+
+UFO Architecture
+================
+
+
+Questions
+=========
+ - Can we pre-allocate several UFO buffers for forth-comming events. Currently, we need to buffer out-of-order
+ packets and copy them later (or buffer everything for simplicity). We can avoid this data copy if we can get
+ at least one packet in advance.
+
+ - How I can execute 'generate' method on 'reductor' filter if no new data on the input for the specified 
+ amount of time. One option is sending empty buffer with metadata indicating timeout. But this is again 
+ hackish.
+
+ - Can we use 16-bit buffers? I can set dimmensions to 1/4 of the correct value to address this. But is it
+ possible to do in a clean way? 
+    * ufo-write definitively only supports fp32 input
+
+ - Are there a standard way to enable automatic propogation of dataset metadata trough the chain of filters? [solved]
+    * Metadata from all input buffers is copied to output automatically in 'processors'
+    * Metadata should be copied with 'ufo_buffer_copy_metadata' in 'reductors'
+ 
+ - We can create a separate subgraph for each plane. But this requires initial copy. Can this be zero-copy?
+ 
+ - What is 'ufotools' python package mentioned in documentation? Just a typo?
diff --git a/src/DEVELOPMENT b/src/DEVELOPMENT
deleted file mode 100644
index 18a8011..0000000
--- a/src/DEVELOPMENT
+++ /dev/null
@@ -1,56 +0,0 @@
-Architecture
-===========
- - Current implementation follows UFO architecture: reader and dataset-builder are split in two filters.
-  * The reader is multi-threaded. However, only a single instance of the builder is possible to schedule.
-  This could limit maximum throughput on dual-head or even signle-head, but many-core systems.
-  * Another problem here is timing. All events in the builder are initiaded from the reader. Consequently,
-  as it seems we can't timeout on semi-complete dataset if no new data is arriving.
-  * Besides, performance this is also critical for stability. With continuous streaming there is no problem,
-  however, if a finite number of frames requested and some packets are lost, the software will wait forever
-  for missing bits.
-
-
-Problems
-========
- - When streaming at high speed (~ 16 data streams; 600 Mbit & 600 kpck each), the data streams quickly get
- desynchronized (but all packets are delivered).
-    * It is unclear if problem is on the receiver side (no overloaded CPU cores) or de-synchronization is first
-    appear on the simmulation sender. The test with real hardware is required.
-    * For border case scenarios, increasing number of buffers from 2 to 10-20 helps. But at full speed, even 1000s
-    buffers are not enough. Packets counts are quickly going appart.
-    * Further increase of packet buffer provided to 'recvmmsg' does not help (even if blocking is enforced until 
-    all packets are received)
-    * At the speed specified above, the system works also without libvma.
-    * Actually, with libvma a larger buffer is required. In the beginning the performance of libvma is gradually 
-    speeding up (that was always like that). And during this period a significant desynchronization happens. To
-    compensate it, we need about 400 buffers with libvma as compared to only 10 required if standard Linux 
-    networking is utilized.
- - In any case (LibVMA or not), some packets will be lost in the beginning if high-speed communication is tested.
-    * Usually, first packets are transferred OK, but, then, a few packets will be lost occasionally here and there
-    (resulting in broken frames). This basically breaks grabbing a few packets and exitig. Unclear if server- or 
-    client-side problem (makes sense to see how real-hardware will behave).
-    * Can we pre-heat to avoid this speeding-up problem (increase pre-allocated buffers, disable power-saving 
-    mode, ??) Or it will be also not a problem with hardware? We can send UDP packets (should be send from another
-    host), but packets are still lost:
-        for i in $(seq 4000 4015); do echo "data" > /dev/udp/192.168.34.84/$i; done
-    * The following doesn't help: new version of libvma, tunning of the options.
- - Communication breaks with small MTU sizes (bellow 1500), but this is probably not important (Packets are 
- delivered but with extreme latencies. Probably some tunning of network stack is required).
- - Technically, everything should work if we start UFO server when data is already streamed. However, the first
- dataset could be any. Therefore, the check fails as the data is shifted by a random number of datasets.
-
-
-Questions
-=========
- - Can we pre-allocate several UFO buffers for forth-comming events. Currently, we need to buffer out-of-order
- packets and copy them later (or buffer everything for simplicity). We can avoid this data copy if we can get
- at least one packet in advance.
-
- - How I can execute 'generate' method on 'reductor' filter if no new data on the input for the specified 
- amount of time. One option is sending empty buffer with metadata indicating timeout. But this is again 
- hackish.
-
- - Can we use 16-bit buffers? I can set dimmensions to 1/4 of the correct value to address this. But is it
- possible to do in a clean way?
- 
- - What is 'ufotools' python package mentioned in documentation? Just a typo?
-- 
cgit v1.2.1