Cost models for geo-distributed massively parallel streaming analytics [article]

Anna-Valentini Michailidou, Anastasios Gounaris, Konstantinos Tsichlas
2021 arXiv   pre-print
This report is part of the DataflowOpt project on optimization of modern dataflows and aims to introduce a data quality-aware cost model that covers the following aspects in combination: (1) heterogeneity in compute nodes, (2) geo-distribution, (3) massive parallelism, (4) complex DAGs and (5) streaming applications. Such a cost model can be then leveraged to devise cost-based optimization solutions that deal with task placement and operator configuration.
arXiv:2105.12507v1 fatcat:su5qm2arerf2ddadfm6ui6j5gm