Skip to content

Concepts

Warning

Unsorted notes

Parallel

Parallel are operations running simultaneously yet independently in multiple threads, processes or machines. For CPU-intensive workloads, past the overhead of arranging such a system, a speedup equal to the number of CPU cores is possible.

Parallel access with Scenario
Lock Resource A (locked) -> Process 1
No Lock Resource A \(\longrightarrow\) Process 1
Resource A \(\longrightarrow\) Process 2

Concurrent

Concurrent are multiple operations managed during overlapping periods yet not necessarily executed at the exact same instant. For the particular case of cloud storage, the latency to get the first byte of a read can be comparable or dominate the total time for a request. Practically, launching many requests, and only pay the overhead cost once (they all wait together), enables a large speedup.

Execution Task
Non-Concurrent A \(\rightarrow\) Run \(\rightarrow\) Complete \(\longrightarrow\) B \(\rightarrow\) Run \(\rightarrow\) Complete
Concurrent A \(\rightarrow\) .. \(\rightarrow\) Complete
.. B \(\longrightarrow\) .. \(\rightarrow\) Complete
C \(\longrightarrow\) .. \(\longrightarrow\) Complete
.. .. D \(\longrightarrow\) .. \(\rightarrow\) Complete
.. E \(\rightarrow\) .. \(\longrightarrow\) Complete

Chunks

Reading a chunk?

Byte range Index
Original [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Selection [3, 4, 5, 6]

Compression

Warning

To Do !

Descriptive metadata

Compression Size %
Decompressed [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 0
Compressed [0, 1, 2, 3, 4, 5, 6] 30

Consolidation

A single indexible aggregate dataset

Consolidation Data Parts
Scattered [-] [-] [-] 3
Consolidated [-----------] 1
Aggregation Simple Simple Virtual
File A B V
Points to \(\rightarrow\) A B A, B

Metadata consolidation in a Zarr context, is the combination of all separate metadata files associated with the different arrays and groups within a Zarr hierarchy into a single metadata file. It is a performance optimization technique that reduces the number of read operations required to access metadata. It can be particularly beneficial when working with remote or distributed storage systems.

Asynchronous

Asynchronous is a mechanism performing tasks without waiting for other tasks to complete.

Operation Execution
Sequential A \(\longrightarrow\) Complete \(\rightarrow\) B \(\longrightarrow\) Complete \(\rightarrow\) C \(\longrightarrow\) Complete
Asynchronous A \(\longrightarrow\) .. \(\longrightarrow\) Complete
B \(\longrightarrow\) .. .. \(\longrightarrow\) Complete
C \(\longrightarrow\) .. \(\longrightarrow\) Complete

Serverless

Serverless is a deployment of code in a cloud service which in turn handles server maintenance, scaling, updates and more.

Deployment Management
Traditional Manual server, maintenance, scaling, updates, ..
Serverless Automated cloud service, deployment, scaling, ..

Front- and Back-end

Component Role
Backend Data storage, algorithms, API, processing, serving
Frontend User interface & experience using a browsers or application