This commit is contained in:
Alwin Berger 2025-08-16 07:31:41 +00:00
parent 0ee6f02a89
commit 045709877e

63
AE.md
View File

@ -10,21 +10,23 @@ This document provides instructions for reproducing the experimental results fro
### Claims Supported by Artifact ### Claims Supported by Artifact
If you run the benchmarks as described below, it will produce the following files, which correspond to figures in the paper. If you run the benchmarks as described below, it will produce the following files, which correspond to figures in the paper.
1. **Figure 3**: (Files: `sql_waters_seq_bytes`, `sql_polycopter_seq_dataflow_full`) This scenario is about mutating just input values. While multiple techniques find the worst case for both scenarios, FRET is the fastest to reach the maximum, particular in the second case. FRET also achieves the highest median result.
2. **Figure 4**: (Files: `sql_waters_seq_int`,`sql_release_seq_int`) This scenario is about mutating just interrupt times. While multiple techniques find the worst case for the second scenario, FRET achieves the highest response time on the first one. 1. **Figure 3**: (Files: `sql_waters_seq_bytes`, `sql_polycopter_seq_dataflow_full`) This scenario is about mutating just input values. While multiple techniques find the worst case for both scenarios, FRET is the fastest to reach the maximum, particular in the second case. FRET also achieves the highest median result.
3. **Figure 5**: (Files: `sql_release_seq_full`, `sql_waters_seq_full`) This scenario is about mutating both kinds of inputs simultaniously. Only FRET achieves the worst possible time when looking at the median results of the first scenario. For the second one, FRET alone reaches the highest response times. Thid demonstrates, that FRET's advantage over other techniques is particularly pronounced when both kinds of inputs are compared. 2. **Figure 4**: (Files: `sql_waters_seq_int`,`sql_release_seq_int`) This scenario is about mutating just interrupt times. While multiple techniques find the worst case for the second scenario, FRET achieves the highest response time on the first one.
4. **Figure 6**: (File: `all_tasks`) This is a comparison of FRET's advantage over the best other technqiues on each task of the `waters` scenario. FRET ends up above every other technique on each task, validating the comparison in fig. 5 b). 3. **Figure 5**: (Files: `sql_release_seq_full`, `sql_waters_seq_full`) This scenario is about mutating both kinds of inputs simultaniously. Only FRET achieves the worst possible time when looking at the median results of the first scenario. For the second one, FRET alone reaches the highest response times. Thid demonstrates, that FRET's advantage over other techniques is particularly pronounced when both kinds of inputs are compared.
4. **Figure 6**: (File: `all_tasks`) This is a comparison of FRET's advantage over the best other technqiues on each task of the `waters` scenario. FRET ends up above every other technique on each task, validating the comparison in fig. 5 b).
## Getting Started ## Getting Started
### Option 1: VirtualBox Images (Recommended) ### Option 1: VirtualBox Images (Recommended)
Download our ready-made VM image from [https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/fret.ova](https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/fret.ova) Download our ready-made VM image from [https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/fret.ova](https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/fret.ova)
- **VM Configuration**: Allocate as much RAM as possible (minimum 32GB, recommended 512GB)
- **CPU Allocation**: One core per 4-8GB of RAM (recommended 64 cores, 256-512GB RAM) - **VM Configuration**: Allocate as much RAM as possible (minimum 32GB, recommended 512GB)
- **Disk Space**: At least 100GB free space for results - **CPU Allocation**: One core per 4-8GB of RAM (recommended 64 cores, 256-512GB RAM)
- **Login**: Username: `osboxes.org`, Password: `osboxes.org` - **Disk Space**: At least 100GB free space for results
- **See ~/FRET** - **Login**: Username: `osboxes.org`, Password: `osboxes.org`
- **Ensure you have the right version**: ``git checkout RTSS25-AE && git submodule update --init`` - **See ~/FRET**
- **Ensure you have the right version**: ``git checkout RTSS25-AE && git submodule update --init``
### Option 2: Setup From Scratch ### Option 2: Setup From Scratch
**Prerequisites**: Linux x86_64 system with Nix package manager installed **Prerequisites**: Linux x86_64 system with Nix package manager installed
@ -128,11 +130,12 @@ open $DUMP/show_job.html
``` ```
This script will reproduce all figures 3-6 in the eval section of the paper. This script will reproduce all figures 3-6 in the eval section of the paper.
You can edit the environemnt variables at the top to change the following parameters: You can edit the environemnt variables at the top to change the following parameters:
- `CORES`: The number of (physical) cores of the VM / fuzzers running in parallel. You will need about 8GB of RAM per fuzzer.
- `RUNTIME`: Time spent on each fuzzing run in seconds (the default is 24h). 8h should be sufficient to see results similar to the paper. - `CORES`: The number of (physical) cores of the VM / fuzzers running in parallel. You will need about 8GB of RAM per fuzzer.
- `TARGET_REPLICA_NUMBER`: The number of replicas for each regular configuration. - `RUNTIME`: Time spent on each fuzzing run in seconds (the default is 24h). 8h should be sufficient to see results similar to the paper.
- `RANDOM_REPLICA_NUMBER`: The number of replicas for configurations with random fuzzing. These usually deviate very little from each other and thus can be reduced without affecting the results. - `TARGET_REPLICA_NUMBER`: The number of replicas for each regular configuration.
- `MULTIJOB_REPLICA_NUMBER`: The number of replicas for the figure that compares all techniques for each task of a system. This evluation consists of many configurations, so you can reduce this numer to save a lot of time. - `RANDOM_REPLICA_NUMBER`: The number of replicas for configurations with random fuzzing. These usually deviate very little from each other and thus can be reduced without affecting the results.
- `MULTIJOB_REPLICA_NUMBER`: The number of replicas for the figure that compares all techniques for each task of a system. This evluation consists of many configurations, so you can reduce this numer to save a lot of time.
For complete reproduction of paper results you can use the following configuration, which takes about 5 days on a 64 core machine: For complete reproduction of paper results you can use the following configuration, which takes about 5 days on a 64 core machine:
```bash ```bash
@ -147,12 +150,13 @@ export MULTIJOB_REPLICA_NUMBER=3
## Results ## Results
All results can be found in `LibAFL/fuzzers/FRET/benchmark` inside a directory `eval_xx-xx-xx`. All results can be found in `LibAFL/fuzzers/FRET/benchmark` inside a directory `eval_xx-xx-xx`.
The content should be the following: The content should be the following:
- Plots inside the top-level of the directory. You can compare them to the paper according to the hints under "Claims Supported by Artifact" above.
- A directory `timedump`, which contains subdirectories for each fuzzer. - Plots inside the top-level of the directory. You can compare them to the paper according to the hints under "Claims Supported by Artifact" above.
- Insude you find files for configuration with different seeds. - A directory `timedump`, which contains subdirectories for each fuzzer.
- `.time` contain response times of each execution. - Insude you find files for configuration with different seeds.
- `.case` contains the worst case found by the fuzzer. - `.time` contain response times of each execution.
- `.trace.ron` contains tracing data of the worst case. Such data can be plotted into agantt chart using the tool ``gantt_driver`` - `.case` contains the worst case found by the fuzzer.
- `.trace.ron` contains tracing data of the worst case. Such data can be plotted into agantt chart using the tool ``gantt_driver``
An archive of our results is also provided under [https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/results.zip](https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/results.zip). An archive of our results is also provided under [https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/results.zip](https://sys-sideshow.cs.tu-dortmund.de/downloads/rtss25/results.zip).
@ -163,7 +167,7 @@ An archive of our results is also provided under [https://sys-sideshow.cs.tu-dor
#### Out of Memory Errors #### Out of Memory Errors
- Reduce `CORES` parameter to match available RAM (up to 8GB per core) - Reduce `CORES` parameter to match available RAM (up to 8GB per core)
- Consider using the quick evaluation configuration - Consider using fewer relicas per configuration
#### Build Failures #### Build Failures
```bash ```bash
@ -192,14 +196,13 @@ nix-shell
### Directory Layout ### Directory Layout
``` ```
FRET/ FRET/
├── LibAFL/fuzzers/FRET/ # Main FRET fuzzer implementation +-- LibAFL/fuzzers/FRET/ # Main FRET fuzzer implementation
│ ├── src/ # Source code | +-- src/ # Source code
│ ├── benchmark/ # Evaluation framework | +-- benchmark/ # Evaluation framework
│ └── target/ # Compiled binaries +-- FreeRTOS/FreeRTOS/Demo/CORTEX_M3_MPS2_QEMU_GCC # Our FreeRTOS Demos
├── FreeRTOS/FreeRTOS/Demo/CORTEX_M3_MPS2_QEMU_GCC # Our FreeRTOS Demos +-- one_time_setup.sh # Initial setup script
├── one_time_setup.sh # Initial setup script +-- run_eval.sh # Main evaluation script
├── run_eval.sh # Main evaluation script +-- AE.md # This document
└── AE.md # This document
``` ```
### Key Components ### Key Components