Treffer: Extending SQL for Machine Learning

Title:
Extending SQL for Machine Learning
Publication Year:
2023
Subject Terms:
Document Type:
Dissertation master thesis
File Description:
application/pdf
Language:
English
DOI:
10.35096/othr/pub-6059
Rights:
https://creativecommons.org/licenses/by-nc-nd/4.0/deed.de ; info:eu-repo/semantics/openAccess
Accession Number:
edsbas.D07979AA
Database:
BASE

Weitere Informationen

As machine learning becomes ever more popular, the question of how to enable it on ever more platforms becomes more important. This thesis explores the extension of SQL and extension databases for machine learning. For this purpose, a framework for extending the Exasol database for machine learning is created. This framework uses the existing support for scripting languages and the in-database file system of the Exasol database to integrate existing machine-learning libraries into SQL. Currently, only Scikit-Learn was integrated. The focuses of our framework are simplicity, usability, smooth integration in SQL, and expressive power, while runtime efficiency and speed are only secondary focuses. Further benefits of the framework are reduction of communication overhead, increased data security, simplification of data synchronization, and the usage of core database strengths. Compared to other in-database machine learning approaches, Apache MADlib and Oracle Machine Learning, our framework most likely has inferior speed and efficiency, while having the advantage of using well-integrated and tested libraries. This thesis also provides an overview of related work on extending SQL and databases for machine learning. Furthermore, future directions for our framework are discussed. The created framework is freely available at https://github.com/christoph-grossmann/Exasol_DB_ML_Framework.