From 44cef2cb16dd2bc55ad34d0b8313f7f314b0107a Mon Sep 17 00:00:00 2001 From: "Suren A. Chilingaryan" Date: Mon, 27 Jan 2020 05:30:53 +0100 Subject: Various docs about UFO, ROOF, and further plans --- docs/architecture.txt | 18 +++++++++++++++++ docs/hardware.txt | 6 ++++++ docs/network.txt | 28 ++++++++++++++++++++++++++ docs/todo.txt | 28 ++++++++++++++++++++++++++ docs/ufo.txt | 36 +++++++++++++++++++++++++++++++++ src/DEVELOPMENT | 56 --------------------------------------------------- 6 files changed, 116 insertions(+), 56 deletions(-) create mode 100644 docs/architecture.txt create mode 100644 docs/hardware.txt create mode 100644 docs/network.txt create mode 100644 docs/todo.txt create mode 100644 docs/ufo.txt delete mode 100644 src/DEVELOPMENT diff --git a/docs/architecture.txt b/docs/architecture.txt new file mode 100644 index 0000000..2b00ab3 --- /dev/null +++ b/docs/architecture.txt @@ -0,0 +1,18 @@ +Configuration +============= + x Pre-configured values in C-sources or shall everything be specified in the configuration? [ full-config, less prone to errors if all in one place ] + - Default header size can be computed using defined 'struct', but thats it. + - Do not try to compute maximum packet size, etc. + x Drop all computable parameters from the config [ with a few exceptions ] + - Should be possible to have network receiver without configuring the rest of the ROOF + - Should be possible to keep both dataset size in network config (to avoid rewriting) and the ROOF hardware configuration, but they should match + - n_streams vs. n_modules: having multiple streams per module in future + - samples_per_rotations vs. sample_rate / image_rate: Don't know what happens with Switch rate control, etc.? + x How precise we should verify configuration consistency? [ implement JSON schema at some point ] + x Propogate broken frames by default? Or just drop marking the missing frames with metadata. [ ingest when necessary ] + - I.e. provide filter removing the broken frame or the one generating when necessary (to ingest the uninterrupted flow in the standard UFO filters)? + - How to handle partially broken frames? + x How to handle data planes? [ metadata passes trough processors, but not reductors ] + - Metadata seems preserved while passing trough standard UFO filters. So, this is easiest way out. + - We can build a stand-alone subgraph for each plane, but this likely involves a full data copy. + - This probably OK for simple use case: "raw storage + visualization" as only two instances, but for could be too challenging for multi-plane analysis. diff --git a/docs/hardware.txt b/docs/hardware.txt new file mode 100644 index 0000000..a293887 --- /dev/null +++ b/docs/hardware.txt @@ -0,0 +1,6 @@ + - Jumbo frames are not currently supported, max packet size is 1500 bytes. + * The maximum number of samples per packet can be computed as + n = (1500 - header_size) / sample_size (pixels_per_module * bpp) i.e. 46 = | 1492 / 32 | + * With 46 packets, however, we can't split a full rotation in a whole number of packets. + So, we need to find maximal number m, so that + (m <= n) and (samples_per_rotation % m = 0) i.e. 40 diff --git a/docs/network.txt b/docs/network.txt new file mode 100644 index 0000000..e7e2a34 --- /dev/null +++ b/docs/network.txt @@ -0,0 +1,28 @@ +Problems +======== + - When streaming at high speed (~ 16 data streams; 600 Mbit & 600 kpck each), the data streams quickly get + desynchronized (but all packets are delivered). + * It is unclear if problem is on the receiver side (no overloaded CPU cores) or de-synchronization is first + appear on the simmulation sender. The test with real hardware is required. + * For border case scenarios, increasing number of buffers from 2 to 10-20 helps. But at full speed, even 1000s + buffers are not enough. Packets counts are quickly going appart. + * Further increase of packet buffer provided to 'recvmmsg' does not help (even if blocking is enforced until + all packets are received) + * At the speed specified above, the system works also without libvma. + * Actually, with libvma a larger buffer is required. In the beginning the performance of libvma is gradually + speeding up (that was always like that). And during this period a significant desynchronization happens. To + compensate it, we need about 400 buffers with libvma as compared to only 10 required if standard Linux + networking is utilized. + - In any case (LibVMA or not), some packets will be lost in the beginning if high-speed communication is tested. + * Usually, first packets are transferred OK, but, then, a few packets will be lost occasionally here and there + (resulting in broken frames). This basically breaks grabbing a few packets and exitig. Unclear if server- or + client-side problem (makes sense to see how real-hardware will behave). + * Can we pre-heat to avoid this speeding-up problem (increase pre-allocated buffers, disable power-saving + mode, ??) Or it will be also not a problem with hardware? We can send UDP packets (should be send from another + host), but packets are still lost: + for i in $(seq 4000 4015); do echo "data" > /dev/udp/192.168.34.84/$i; done + * The following doesn't help: new version of libvma, tunning of the options. + - Communication breaks with small MTU sizes (bellow 1500), but this is probably not important (Packets are + delivered but with extreme latencies. Probably some tunning of network stack is required). + - Technically, everything should work if we start UFO server when data is already streamed. However, the first + dataset could be any. Therefore, the check fails as the data is shifted by a random number of datasets. diff --git a/docs/todo.txt b/docs/todo.txt new file mode 100644 index 0000000..8497a69 --- /dev/null +++ b/docs/todo.txt @@ -0,0 +1,28 @@ +Main +==== + + Add plane/frame-number/broken metadata in the UFO buffers. Check propagation trough standard ufo filters. [ propogates trough processors, but not reductors ] + + Plane selector filter + - Handle packets with data from multiple datasets + - Filter to ingest zero-padded broken frames. + - Try UFO 'flat-field' correction filter + - Cone-beam to parallel-beam resampling ? + - Full reconstruction chain + - Try UFO visualization filter + - "Reconstructed data storage" and "Visualization + raw data storage" modes + +Optional +======== + - Try online compression of the ROOF data @Gregoire + - Introduce JSON schema and JSON schema validator @Zhassulan + - If neceessary, reconstruct slightly broken frames (currently dropped) @Zhassulan + - Implement fast-writer filter for quick raw data storage + * Either include plane in the file name or pass invalid frames trough (so we can compute the frame number)? + ? Support multi-plane configuration in the roof.py + +Ufo bugs +======== + - Complain if TIF-writer is used with 1D data. + +Ufo features +============ + ? Support non-fp data in UFO writer diff --git a/docs/ufo.txt b/docs/ufo.txt new file mode 100644 index 0000000..64ffb13 --- /dev/null +++ b/docs/ufo.txt @@ -0,0 +1,36 @@ +ROOF on UFO +=========== + - Current implementation follows UFO architecture: reader and dataset-builder are split in two filters. + * The reader is multi-threaded. However, only a single instance of the builder is possible to schedule. + This could limit maximum throughput on dual-head or even signle-head, but many-core systems. + * Another problem here is timing. All events in the builder are initiaded from the reader. Consequently, + as it seems we can't timeout on semi-complete dataset if no new data is arriving. + * Besides, performance this is also critical for stability. With continuous streaming there is no problem, + however, if a finite number of frames requested and some packets are lost, the software will wait forever + for missing bits. + +UFO Architecture +================ + + +Questions +========= + - Can we pre-allocate several UFO buffers for forth-comming events. Currently, we need to buffer out-of-order + packets and copy them later (or buffer everything for simplicity). We can avoid this data copy if we can get + at least one packet in advance. + + - How I can execute 'generate' method on 'reductor' filter if no new data on the input for the specified + amount of time. One option is sending empty buffer with metadata indicating timeout. But this is again + hackish. + + - Can we use 16-bit buffers? I can set dimmensions to 1/4 of the correct value to address this. But is it + possible to do in a clean way? + * ufo-write definitively only supports fp32 input + + - Are there a standard way to enable automatic propogation of dataset metadata trough the chain of filters? [solved] + * Metadata from all input buffers is copied to output automatically in 'processors' + * Metadata should be copied with 'ufo_buffer_copy_metadata' in 'reductors' + + - We can create a separate subgraph for each plane. But this requires initial copy. Can this be zero-copy? + + - What is 'ufotools' python package mentioned in documentation? Just a typo? diff --git a/src/DEVELOPMENT b/src/DEVELOPMENT deleted file mode 100644 index 18a8011..0000000 --- a/src/DEVELOPMENT +++ /dev/null @@ -1,56 +0,0 @@ -Architecture -=========== - - Current implementation follows UFO architecture: reader and dataset-builder are split in two filters. - * The reader is multi-threaded. However, only a single instance of the builder is possible to schedule. - This could limit maximum throughput on dual-head or even signle-head, but many-core systems. - * Another problem here is timing. All events in the builder are initiaded from the reader. Consequently, - as it seems we can't timeout on semi-complete dataset if no new data is arriving. - * Besides, performance this is also critical for stability. With continuous streaming there is no problem, - however, if a finite number of frames requested and some packets are lost, the software will wait forever - for missing bits. - - -Problems -======== - - When streaming at high speed (~ 16 data streams; 600 Mbit & 600 kpck each), the data streams quickly get - desynchronized (but all packets are delivered). - * It is unclear if problem is on the receiver side (no overloaded CPU cores) or de-synchronization is first - appear on the simmulation sender. The test with real hardware is required. - * For border case scenarios, increasing number of buffers from 2 to 10-20 helps. But at full speed, even 1000s - buffers are not enough. Packets counts are quickly going appart. - * Further increase of packet buffer provided to 'recvmmsg' does not help (even if blocking is enforced until - all packets are received) - * At the speed specified above, the system works also without libvma. - * Actually, with libvma a larger buffer is required. In the beginning the performance of libvma is gradually - speeding up (that was always like that). And during this period a significant desynchronization happens. To - compensate it, we need about 400 buffers with libvma as compared to only 10 required if standard Linux - networking is utilized. - - In any case (LibVMA or not), some packets will be lost in the beginning if high-speed communication is tested. - * Usually, first packets are transferred OK, but, then, a few packets will be lost occasionally here and there - (resulting in broken frames). This basically breaks grabbing a few packets and exitig. Unclear if server- or - client-side problem (makes sense to see how real-hardware will behave). - * Can we pre-heat to avoid this speeding-up problem (increase pre-allocated buffers, disable power-saving - mode, ??) Or it will be also not a problem with hardware? We can send UDP packets (should be send from another - host), but packets are still lost: - for i in $(seq 4000 4015); do echo "data" > /dev/udp/192.168.34.84/$i; done - * The following doesn't help: new version of libvma, tunning of the options. - - Communication breaks with small MTU sizes (bellow 1500), but this is probably not important (Packets are - delivered but with extreme latencies. Probably some tunning of network stack is required). - - Technically, everything should work if we start UFO server when data is already streamed. However, the first - dataset could be any. Therefore, the check fails as the data is shifted by a random number of datasets. - - -Questions -========= - - Can we pre-allocate several UFO buffers for forth-comming events. Currently, we need to buffer out-of-order - packets and copy them later (or buffer everything for simplicity). We can avoid this data copy if we can get - at least one packet in advance. - - - How I can execute 'generate' method on 'reductor' filter if no new data on the input for the specified - amount of time. One option is sending empty buffer with metadata indicating timeout. But this is again - hackish. - - - Can we use 16-bit buffers? I can set dimmensions to 1/4 of the correct value to address this. But is it - possible to do in a clean way? - - - What is 'ufotools' python package mentioned in documentation? Just a typo? -- cgit v1.2.1