Treffer: Identifying Affected Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries.

Title:
Identifying Affected Third-Party Java Libraries from Textual Descriptions of Vulnerabilities and Libraries.
Source:
ACM Transactions on Software Engineering & Methodology; May2025, Vol. 34 Issue 4, p1-27, 27p
Database:
Complementary Index

Weitere Informationen

To address security vulnerabilities arising from third-party libraries, security researchers maintain databases monitoring and curating vulnerability reports. Application developers can identify libraries affected by vulnerability reports (in short, affected libraries) by directly querying the databases with their used libraries. However, the querying results of affected libraries are not reliable due to the incompleteness of vulnerability reports. Thus, current approaches model the task of identifying affected libraries as a named-entity-recognition (NER) task or an extreme multi-label learning (XML) task. These approaches suffer from highly inaccurate results in identifying affected libraries with complex and similar names, e.g., Java libraries. To address these limitations, in this article, we propose VulLibMiner, the first to identify affected libraries from textual descriptions of both vulnerabilities and libraries, together with VulLib, a Java vulnerability dataset with their affected libraries. VulLibMiner consists of a TF-IDF matcher to efficiently screen out a small set of candidate libraries and a BERT-FNN model to effectively identify affected libraries from these candidates. We evaluate VulLibMiner using four state-of-the-art/practice approaches of identifying affected libraries on both their dataset named VeraJava and our VulLib dataset. Our evaluation results show that VulLibMiner can effectively identify affected libraries with an average F1 score of 0.669 while the state-of-the-art/practice approaches achieve only 0.547. [ABSTRACT FROM AUTHOR]

Copyright of ACM Transactions on Software Engineering & Methodology is the property of Association for Computing Machinery and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)