Treffer: A distributed RAG-based framework for automated extraction of information from multiple types of resources
Weitere Informationen
Accessing authoritative information in areas such as healthcare, cybersecurity, and artificial intelligence remains a challenge due to the heterogeneity of data sources and the varying credibility of content. With the increasing integration of advanced technologies into daily life, there is an urgent need for systems that can streamline the retrieval of information and extraction of knowledge from different formats. In this paper, we present a distributed, Retrieval-Augmented Generation (RAG) based framework that aims to automate the extraction and structuring of information from multimodal resources, such as websites, PDFs, images, audio, and video. The framework supports real-time data processing and is optimized for the creation of open data sets in any subject area. To validate our approach, we applied it to cigars and beverages, using content from online articles, reviews, and posts. Our results show the framework’s potential to simplify data integration, improve usability and enable scalable, contextual knowledge generation.