Treffer: Oracles for the Equivalence of Java Bytecode

Title:
Oracles for the Equivalence of Java Bytecode
Publisher Information:
Zenodo
Publication Year:
2024
Collection:
Zenodo
Subject Terms:
Document Type:
dataset
Language:
English
DOI:
10.5281/zenodo.13381845
Rights:
Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number:
edsbas.7079ABF0
Database:
BASE

Weitere Informationen

Incidents like log4shell and SolarWinds have led to an increased focus on software supply chain security. A particular concern is the detection and prevention of compromised builds. A common approach is to independently re-build projects, and compare the results. This leads to the availability of different binaries built from the same sources, and raises the question of how to compare the respective binaries (to confirm the integrity of builds, to detect compromised builds, etc). It is however not clear how to do this: naive bitwise comparison is often too strict, and establishing the behavioural equivalence of two binaries is undecidable. A pragmatic step towards a solution is to provision a benchmark that can be used to test and train equivalence relations. We present such a benchmark for Java bytecode, consisting of \input{generated/total-oracle-record-count}pairs of binaries (compiled Java classes) labelled as to whether these classes are equivalent or not. We refer to these pairs as equivalence and non-equivalence oracles, respectively. We derive equivalence oracles from building 56 projects and project versions using 32 dockerised build environments (with different compilers, compiler versions and configurations). Non-equivalence oracles are derived from three different sources: (1) proven breaking API changes, (2) semantic code changes synthesised by means of bytecode mutations, and (3) code changes extracted from vulnerability patches. A detailed description of the dataset can be found in: Jens Dietrich, Tim White, Mohammad Mahdi Abdollahpou, Elliott Wen and Behnaz Hassanshahi: BenEq -- A Benchmark of Compiled Java Programs to Assess Alternative Builds. Proceedings of the ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED '24).