Treffer: The hidden complexities of Android TPL detection: An empirical analysis of techniques, challenges, and effectiveness.
Weitere Informationen
Third-party libraries (TPLs) play a crucial role in Android application (app) development and have become an indispensable part of the Android ecosystem. However, TPLs also introduce potential security risks, as they may propagate 1-day vulnerabilities or even malicious code into apps. Moreover, certain downstream tasks, such as app clone detection, license violation identification and patch presence test, require accurate TPL detection as a prerequisite. Consequently, TPL detection has gained increasing importance over the past decade in improving maintainability and enhancing security within the software supply chain. To ensure robustness against external factors and precise vulnerability identification, modern library detection tools, in addition to recognizing TPL variety, must be resilient to code obfuscation and optimization, and must also be capable of accurately identifying library versions. Although recent studies have reported progress in addressing these issues, none have conducted a comprehensive evaluation to determine whether the proposed methods effectively overcome these challenges. Furthermore, critical aspects such as tool performance on real-world apps, as well as the generalizability of existing approaches, are frequently overlooked in current research. To gain deeper insights into TPL detection research, we conducted a comprehensive empirical analysis of state-of-the-art approaches in this domain. This study begins by summarizing the common technologies used at each stage of the TPL detection process, followed by an analysis of the prevalence of code obfuscation and optimization in real-world apps to identify key external factors that hinder effective library detection. Next, we evaluate the performance of cutting-edge tools on multiple ground-truth datasets to validate our findings. Specifically, we systematically analyze the methodologies employed by these tools, assessing their capabilities in TPL variety detection, version identification, resilience to common obfuscation and optimization techniques, and the underlying causes of their failures. Finally, we assessed the generalizability of these tools by comparing their performance across diverse datasets and validating them with real-world data. Our findings confirm that obfuscation and optimization are indeed prevalent in real-world scenarios. However, the code transformations introduced by these techniques often exceed the scope of scenarios considered in prior TPL detection studies. We also observe that even the most advanced detection features struggle to accurately differentiate between library versions. In addition to errors caused by obfuscation and optimization, overly simplistic library features can further contribute to false positives. Moreover, while most tools perform well on their own curated datasets and show reduced performance on external datasets, their effectiveness in real-world scenarios does not exhibit a substantial disparity. Overall, this paper presents a comprehensive analysis and evaluation of current TPL detection techniques, providing a solid foundation for future research in this area. [ABSTRACT FROM AUTHOR]
Copyright of Computers & Security is the property of Pergamon Press - An Imprint of Elsevier Science and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)