Anzahl der Treffer: 328
Erstellt: Tue, 16 Jul 2024 23:16:49 +0200 in 0.0947 sec


Jibril, Muhammad Attahir; Baumstark, Alexander; Sattler, Kai-Uwe
Adaptive update handling for graph HTAP. - In: Distributed and parallel databases, ISSN 1573-7578, Bd. 41 (2023), 3, S. 331-357

Hybrid transactional/analytical processing (HTAP) workloads on graph data can significantly benefit from GPU accelerators. However, to exploit the full potential of GPU processing, dedicated graph representations are necessary, which mostly make in-place updates difficult. In this paper, we discuss an adaptive update handling approach in a graph database system for HTAP workloads. We discuss and evaluate strategies for propagating transactional updates from an update-friendly table storage to a GPU-optimized sparse matrix format for analytics.



https://doi.org/10.1007/s10619-023-07428-y
Jibril, Muhammad Attahir; Baumstark, Alexander; Sattler, Kai-Uwe
GTPC: towards a hybrid OLTP-OLAP graph benchmark. - In: Datenbanksysteme für Business, Technologie und Web (BTW 2023), (2023), S. 105-117

Graph databases are gaining increasing relevance not only for pure analytics but alsofor full transactional support. Business requirements are evolving to demand analytical insights onfresh transactional data, thereby triggering the emergence of graph systems for hybrid transactional-analytical graph processing (HTAP). In this paper, we present our ongoing work on GTPC, a hybridgraph benchmark targeting such systems, based on the TPC-C and TPC-H benchmarks.



Baumstark, Alexander; Jibril, Muhammad Attahir; Sattler, Kai-Uwe
Accelerating large table scan using Processing-In-Memory technology. - In: Datenbanksysteme für Business, Technologie und Web (BTW 2023), (2023), S. 797-814

Today’s systems are capable of storing large amounts of data in main memory. In-memoryDBMSs can benefit particularly from this development. However, the processing of the data fromthe main memory necessarily has to run via the CPU. This creates a bottleneck, which affects thepossible performance of the DBMS. The Processing-In-Memory (PIM) technology is a paradigm toovercome this problem, which was not available in commercial systems for a long time. However, withthe availability of UPMEM, a commercial system is finally available that provides PIM technologyin hardware. In this work, the main focus was on the optimization of the table scan, a fundamental,and memory-bound operation. Here a possible approach is shown, which can be used to optimizethis operation by using PIM. This method was then tested for parallelism and execution time inbenchmarks with different table sizes and compared to the usual table scan. The result is a table scanthat outperforms the scan on the usual CPU significantly.



Schlegel, Marius; Sattler, Kai-Uwe
Management of machine learning lifecycle artifacts: a survey. - In: ACM SIGMOD record, Bd. 51 (2023), 4, S. 18-35

The explorative and iterative nature of developing and operating ML applications leads to a variety of artifacts, such as datasets, features, models, hyperparameters, metrics, software, configurations, and logs. In order to enable comparability, reproducibility, and traceability of these artifacts across the ML lifecycle steps and iterations, systems and tools have been developed to support their collection, storage, and management. It is often not obvious what precise functional scope such systems offer so that the comparison and the estimation of synergy effects between candidates are quite challenging. In this paper, we aim to give an overview of systems and platforms which support the management of ML lifecycle artifacts. Based on a systematic literature review, we derive assessment criteria and apply them to a representative selection of more than 60 systems and platforms.



https://doi.org/10.1145/3582302.3582306
Kläbe, Steffen; DeSantis, Bobby; Hagedorn, Stefan; Sattler, Kai-Uwe
Accelerating Python UDFs in vectorized query execution. - [USA?] : CIDR Conference. - 1 Online-Ressource (7 Seiten)Publikation entstand im Rahmen der Veranstaltung: CIDR 2022 : 12th Annual Conference on Innovative Data Systems Research (CIDR ’22), January 9-12, 2022, Chaminade, USA

https://doi.org/10.22032/dbt.59388
Lasch, Robert; Moghaddamfar, Mehdi; May, Norman; Demirsoy, Suleyman S.; Färber, Christian; Sattler, Kai-Uwe
Bandwidth-optimal relational joins on FPGAs. - Konstanz : University of Konstanz. - 1 Online-Ressource (Seite 1:27-1:39)Online-Ausgabe: Proceedings of the 25th International Conference on Extending Database Technology, EDBT 2022, Edinburgh, UK, March 29 - April 1, 2022. - OpenProceedings.org 2022, ISBN 978-3-89318-086-8

https://doi.org/10.5441/002/edbt.2022.03
Schlegel, Marius; Sattler, Kai-Uwe
Cornucopia: tool support for selecting machine learning lifecycle artifact management systems. - Setúbal : Scitepress. - 1 Online-Ressource (Seite 444-450)Online-Ausgabe: Proceedings of the 18th International Conference on Web Information Systems and Technologies, WEBIST, October 25-27, 2022, in Valletta, Malta / editors: Stefan Decker ... - Setúbal : Scitepress, 2022, ISBN 978-989-758-613-2

The explorative and iterative nature of developing and operating machine learning (ML) applications leads to a variety of ML artifacts, such as datasets, models, hyperparameters, metrics, software, and configurations. To enable comparability, traceability, and reproducibility of ML artifacts across the ML lifecycle steps and iterations, platforms, frameworks, and tools have been developed to support their collection, storage, and management. Selecting the best-suited ML artifact management systems (AMSs) for a particular use case is often challenging and time-consuming due to the plethora of AMSs, their different focus, and imprecise specifications of features and properties. Based on assessment criteria and their application to a representative selection of more than 60 AMSs, this paper introduces an interactive web tool that enables the convenient and time-efficient exploration and comparison of ML AMSs.



https://doi.org/10.5220/0011591700003318
Al-Sayeh, Hani; Jibril, Muhammad Attahir; Bin Saeed, Muhammad Waleed; Sattler, Kai-Uwe
SparkCAD: caching anomalies detector for spark applications. - In: Proceedings of the VLDB Endowment, ISSN 2150-8097, Bd. 15 (2022), 12, S. 3694-3697

Developers of Apache Spark applications can accelerate their workloads by caching suitable intermediate results in memory and reusing them rather than recomputing them all over again every time they are needed. However, as scientific workflows are becoming more complex, application developers are becoming more prone to making wrong caching decisions, which we refer to as caching anomalies, that lead to poor performance. We present and give a demonstration of Spark Caching Anomalies Detector (SparkCAD), a developer decision support tool that visualizes the logical plan of Spark applications and detects caching anomalies.



https://doi.org/10.14778/3554821.3554877
Sattler, Kai-Uwe; Härder, Theo
Editorial. - In: Datenbank-Spektrum, ISSN 1610-1995, Bd. 22 (2022), 1, S. 1-4

https://doi.org/10.1007/s13222-022-00405-2
Al-Sayeh, Hani; Jibril, Muhammad Attahir; Memishi, Bunjamin; Sattler, Kai-Uwe
Blink: lightweight sample runs for cost optimization of big data applications. - In: New Trends in Database and Information Systems, (2022), S. 144-154

Distributed in-memory data processing engines accelerate iterative applications by caching datasets in memory rather than recomputing them in each iteration. Selecting a suitable cluster size for caching these datasets plays an essential role in achieving optimal performance. We present Blink, an autonomous sampling-based framework, which predicts sizes of cached datasets and selects optimal cluster size without relying on historical runs. We evaluate Blink on iterative, real-world, machine learning applications. With an average sample runs cost of 4.6% compared to the cost of optimal runs, Blink selects the optimal cluster size, saving up to 47.4% of execution cost compared to average cost.



https://doi.org/10.1007/978-3-031-15743-1_14