Treffer: Big Data Storage and Query Optimization Based on Distributed Computing Algorithm.
Weitere Informationen
The growth of data volume makes it difficult for traditional storage architecture to meet the storage needs of big data, and the query efficiency is also greatly reduced. Therefore, there is an urgent need for an efficient storage and query optimization strategy to improve big data processing capabilities and system performance. To achieve the above goals, this study first introduces the Hadoop HDFS distributed file system as the underlying storage architecture. Multiple machines work together to store data in multiple nodes. At the same time, data sharding technology is used to cut data into multiple small blocks and distribute them to different nodes to achieve parallel data processing and load balancing. In terms of query optimization, this study improves query efficiency through strategies such as index optimization, query rewriting, and parallel query. This study establishes a suitable index structure for the data to accelerate the data retrieval process; by rewriting complex query statements, it is converted into a more efficient equivalent form. In addition, this study uses the parallel processing capabilities of the distributed computing framework to distribute query tasks to multiple nodes for parallel execution. Compared with traditional methods, the distributed storage architecture proposed in this paper performs well in terms of storage space utilization, reaching a maximum of 97.6%. At the same time, in terms of fault recovery time, this method is better than traditional methods, such as the recovery time in process 1 is shortened from 29.9 seconds to 15 seconds. In terms of query response time, the average query response time of this method in 30 processes is only 297.2 milliseconds, which is much lower than the 560 milliseconds of traditional methods. [ABSTRACT FROM AUTHOR]