summaryrefslogtreecommitdiffstats
path: root/docs/architecture.txt
diff options
context:
space:
mode:
Diffstat (limited to 'docs/architecture.txt')
-rw-r--r--docs/architecture.txt98
1 files changed, 98 insertions, 0 deletions
diff --git a/docs/architecture.txt b/docs/architecture.txt
index 2b00ab3..e38da42 100644
--- a/docs/architecture.txt
+++ b/docs/architecture.txt
@@ -16,3 +16,101 @@ Configuration
- Metadata seems preserved while passing trough standard UFO filters. So, this is easiest way out.
- We can build a stand-alone subgraph for each plane, but this likely involves a full data copy.
- This probably OK for simple use case: "raw storage + visualization" as only two instances, but for could be too challenging for multi-plane analysis.
+
+
+Overloaded Mode
+===============
+ x How/when do we stop capturing in fixed frame mode. [ After building required number of frames ]
+ - After capturing theoretical number of packets required for reconstruction (or plus one frame to address half)
+ * Initail packets are likely broken as our experiments with LibVMA shows... Likely will be less...
+ * As work-around, we can _always_ skip first few frames to allow system become responsive.
+ * Overall, this seems a less reliable system.
+ => After initial building of the required number of packets.
+ * Unlike previous case, this may never finish if processing is bottleneck as buffers will be overwritten...
+ * Alternative is to stop capturing frames if buffers are exhausted...
+ x Do we pause receiving when buffer is exhausted or do we start over-writting (original ufo variant is overwritting). [ stop ]
+ - For streaming, the overwritting is better as it is better skipping older frames rather than newer.
+ => But for capturing fixed number of frames, we need to stop streaming (in overloaded case) or the frames will be continuously overwritten.
+ * This is more repeatable way to handle the frames and something has to be optimized anyway if we are to slow in streaming mode,
+ but in this mode we can reliably get first frames even if event-building is overloaded.
+ x Do we always skip first frames to keep system responsive or we process normally? [ Not now, TBD later if necessary ]
+ - Technically, skipping first frames could allow faster stabilization.
+ - Could only cause problems if streaming is started by some trigger and first frames are really important.
+ - This would be mandatory in fixed-frame-mode if stopping independent of reconstruction.
+
+Data Flow Model
+===============
+ x MP-RQ vs recvfrom_zcopy vs SocketXtreme
+ => MP-RQ is least overhead method (seems to remove one memcpy) with requirements fitting our use-case
+
+ x How to abstract read/LibVMA read access?
+ => Return pointer to buffer, number of packets, and padding between packets.
+
+ x Possible Models: Buffer independent streams or buffer sinograms
+ - Sinogram-buffer (push-model)
+ * Sinograms are buffered. The readers are directly writting to appropriate place in the sinogram buffers and increment fragment
+ counter for each buffer.
+ * Readers are paused on the first item out of current buffer and wait until buffer start is advanced.
+ * After receving fragment for a new sinogram, the reader informs controller about number of missing framgements in the buffer.
+ E.g. counting 'missing' fragments using atomic array (on top of completed).
+ * The controller advances buffer start after 'missing' is increased above the specified threshold.
+ - Stream-buffers (pull-model)
+ * Data is directly buffered in the independent receive-buffers. So, there is no memcpy in the receiving part of the code.
+ * Master builder thread determines sinogram to build (maximum amongst buffer).
+ * Builder threads skip to the required sinogram and start copying data until missing fragment is detected
+ * On missing fragment, new sinogram is determined and operation restarted (when to skip large amount?)
+
+ x Locking with global buffer (No problems if buffering independent streams)
+ - We can syncrhonize "Advances of Buffer Start" and "Copy out" with locks as this is low frequency events, but we need to ensure that
+ copy fragments are lock-less.
+ - In the 'counting' scenario this is problematc:
+ - Different threads write to diffent parts of the buffer. If the buffer start moves, it is OK to finish the old fragment. It will be
+ rewritten by the same thread with new data later. No problems here.
+ - But how to handle framgent counting? Atomics are fine for concurrent threads. But if we move buffer, the fragment count should not
+ be increased. This means we need execute 'if' and 'inc' atomicly (increase only if buffer has not moved and the move could happen between
+ 'if' and 'inc'). This is the main problem.
+ - We can push increases to pipe and use the main thread (which also advances the start of the buffer) to read from the pipe and
+ increase counts (or ignore if buffer moved). Is it faster than locking (or linux pipes perform locking or kernel/user-space
+ switches anyway?)
+ ? Can we find alternative ways without locks/pipes? E.g. using hashes? Really large counting arrays?
+
+ ? Performance considerations
+ - Sinogram-buffer advantage
+ - Sinogram-buffer works fine with re-ordered data streams. The stream-buffer approach can handle the re-odered streams in theory, but
+ with large performance penalty and incresed complexithy. But in fact, there is little reason why re-ordering could happen and experiments
+ with existing switch doesn't show any re-ordering.
+ - There is always uncached reads. However, in sinogram-buffer it is uncached copy of large image. And in case of stream-buffer, the
+ many small images are accessed uncached.
+ ? With Sinogram-buffer we can prepare data in advance and only single memcpy will be required. With stream-buffer all steps are performed
+ only once the buffer is available. But does it has any impact on performance?
+ ? With stream-buffer, building _seems_ more complex and requires more synchronization (but only if we could find a simple method to avoid
+ locking in the sinogram-buffer scenario).
+
+ => Stream-buffer advantage
+ - With MP-RQ we can use Mellanox ring-buffer (and probably Mellanox in general) as stream-buffer.
+ - Stream-buffer incures singifincantly less load during the reading phase. If building overloads system, with this approach we can
+ buffer as much data as memory permits and process it later. This is not possible with sinogram-buffer approach. But we still have
+ socket buffers...
+ - Stream-buffer removes one 'memcpy' unless zero-copy SocketExtreme is used. We also need to store raw data with fastwriter and the
+ new external sinogram buffer could be used to store data also for fastwriter. Hence, removing the necessity to memcpy there.
+ I.e. solved with SocketExtreme otherwise either performance penalty or additional complexity here.
+ ? Stream-buffer simplifies lock management. We don't need to provide additional pipes to reduce amount of required locking. Or is
+ there a simple solution?
+
+ ? Single-thread builder vs multi-thread builder?
+
+ ? LRO (Large Receive Offload). Can we use it? How it compareswith LibVMA?
+
+
+
+
+
+
+Dependencies
+============
+ x C11 threads vs pthreads vs glib threads
+ * C11 threads are only supported starting with glibc 2.28. Ubuntu 18.04 sheeps 2.27. There are work-arounds, but there is
+ little advantage over pthreads.
+ * We still will try to use atomics from C11.
+
+