Automated Cyber Attack & Defense Logs from ADS-24/Crate doi-10-71775-kth-4mh6z-zw065-0 https://doi.org/10.71775/KTH.4MH6Z-ZW065 Swedish National Data Service Svensk nationell datatjänst Landing page Automated Cyber Attack & Defense Logs from ADS-24/Crate doi-10-71775-kth-4mh6z-zw065-0 https://doi.org/10.71775/KTH.4MH6Z-ZW065 Nyberg, Jakob Sommestad, Teodor Ekstedt, Mathias Johnson, Pontus Swedish Defence Research Agency MSB 2021-00896 Swedish National Data Service Svensk nationell datatjänst Landing page This is event log data of activities generated by autonomous agents in the computer network "ADS-24", implemented in FOI's cyber range Crate. The main goal of the experiment was to measure the performance of automated defensive agents when the network is attacked for a set period of time. We also tested how variations to the problem, in the form of user agents, attacker strategies and network topology, affected the returns of the agents. Please note that ADS-24 is an entirely fictional network, and all content (such as hostnames, usernames, email contents, etc.) was constructed for this experiment. The data was collected over the course of two months by running repeated episodes, where each episode contains one attack. The two hour episodes contain different combinations of an attacker agent, various defender agents and simulated users. The data is grouped by month and episode identifiers, where the episode id is a combination of the date and hour the episode started. Network The network we have implemented is based on a scenario description written by domain experts at fmv. The scenario primarily consists of a specification of a fictive maintenance management system, named “AIR-DELIVERY-SYSTEM24” (ADS-24). The intended functions of is to keep track of maintenance needs, purchase new spare parts and store costs and salaries. ADS-24 consists of four subnets, identified as “CLIENT”, “DMZ”, “SRV” and “SOC”, which are all connected through a shared firewall server. The 37 machines across the subnets run either Linux or Windows, with machines running Windows being the most frequent. The SOC network contains machines related to monitoring and is inaccessible to all agents to not disturb logging during experiments. The layout of is illustrated as a graph in [ads-24.png]. Alert Rules & Active Responses A Wazuh database server was run in the SOC section of the network and each host, except “flightlogs”, had a Wazuh agent service that sends events to the central database. The parser rules used by the Wazuh agents were selected based on a combination of reviews, recommendations and standards. This includes the log policy of the Swedish Armed Forces and widely used configurations such as that for Sysmon by SwiftOnSecurity. The rules cover log data from Snort for IP packets, Auditd and Syslog for Linux hosts as well as Sysmon and event logs for Windows systems. Along with Wazuh’s default rule set, 586 rules from the Sigma repository were included. The Sigma rules were selected based on their relevance to the scenario, meaning that rules in the Sigma repository related to services not part of the scenario were excluded. Every host in runs Osquery to collect host information, such as user accounts and network interfaces. Information about these elements are sent to Wazuh at regular intervals. We used the Wazuh feature “Active Response” to allow defense agents to execute a set of prepared commands on hosts in the network by calling the Wazuh REST API. Two commands were implemented: one that powers off a given machine, and one that blocks traffic between a given host and other subnets in both directions. Blocking is done by adding new firewall rule entries to the host “fw1”. The commands are parametrized with a single argument; the identifier of the Wazuh agent service that should execute the command. Network Activity Regular activity in ADS-24 is composed of periodic updates to the log database from external IP addresses, computer-to-computer communication within the Windows domain, and actions performed by simulated users. The simulated users follow predefined schedules in which they exchange emails, access internal web interfaces, use remote desktop applications, and open files of various types. Occasionally, a simulated system administrator connects to machines and executes commands using Remote Desktop, PsExec, or wmi. We also generate threat actor activity using the red-team automation tool Lore. The initial entry point machine of Lore was always the host “flightlogs”. As mentioned, “flightlogs” does not run a Wazuh agent, making it functionally invisible to and untouchable by defender agents. Lore can therefore never be fully expelled from the network. Lore attempts to compromise systems by selecting different actions from a pool of available options, according to its configuration, which may or may not succeed. Two configurations for Lore were used, which we call “Guided” and “Exploratory”. With the guided configuration, Lore is configured with blacklists to ignore machines that are not along the fastest path between the entry point and the designated crown jewels in the DMZ segment. With the exploratory configuration, Lore may prioritize attacking machines not taking it closer to the DMZ, but is also more flexible and less predictable. Lore records its activity in a log stored in the control plane. The log, and thus the “true” attacker state, is therefore not accessible from the event plane where the defender agent receives data from. MAL Modeling We assume the defender agent make decisions based on the MAL data model of the network. To model the components of ADS-24, we created a smaller version of the MAL language CoreLang , titled CadsLang. The language models two attack vectors leading to access to a host’s data: one through using a software vulnerability, and one where access is gained through brute-forcing credentials. The language contains two defense steps, “Application.NotPresent” and “ConnectionRule.Restricted”, to correspond with the two commands implemented in Wazuh. Functionally, “NotPresent” blocks both attack vectors for a host, making it impossible to access its data for an attacker. The “Restricted” step blocks the “ConnectionRule” attack steps representing access to the host from a different network, but still allows an attacker to access the host through internal subnet connections. Each MAL attack step was associated with a set of Wazuh rule identifiers, a set of Wazuh rule groups and a set of rule that should be ignored. Since attack steps are associated with assets, the event being mapped needs to contain an identifier that was also encountered in the instance model creation procedure, such as an IP address, username or host identifier. The mapping is done by the Wazuh/MAL Interface.We tested the alert mappings by collecting multiple two hour periods of data from the network with and without user agents, and without any attacker or defender agents. Under the assumption that no adversarial actions are taken in the network during this time, we treat all observed attack steps as false positives. This yielded an average false positive probability per time step for the alert mapping at 1.7% per attack step without users and 3.3% with users. Data Collection Method Blue agent interaction and logging was done by the Wazuh/MAL Interface, which also handled experiment scheduling. The Wazuh/MAL interface was run on a laptop running virtual machines, with VPN access to CRATE. A diagram showing the machine setup we used is shown in [experiment_setup.png]. Vejde agents trained with reinforcement learning were trained using the MAL Simulator, using the Vejde/MAL Simulator interface. The Wazuh logs and Lore JSON files were produced in CRATE, with FOI tooling. Agents using policies optimized with RL were trained using the MAL Simulator, with the Vejde library for agent architectures. Training was done using 2 million transitions, sampled from a combination of simulation environments using different attacker policies, attacker entry-points and network topology variations for each episode. Experiment Procedure We ran experiments in an episodic fashion, with each episode lasting two hours. The Crate snapshot functionality was used to start each episode from the same system state. To allow the system to settle after being restored, episodes were started half an hour after the network was restored. We evaluated the following agents: “Vejde”, a policy trained in the MAL Simulator with reinforcement learning; “Vejde w/ Noise”, same as “Vejde”, but trained with a 1% false positive and false negative rate per attack step; “Heuristic”, a policy that selects an associated defense step of an asset if an associated attack step is observed; “NoOp”, a policy that does nothing. All agents use the same MAL data model for its input. Each episode used a single defender agent, sampled without replacement from the set of available agents. During episodes, Wazuh was queried for new events at a fixed 30-second interval, and if any returned events were matched with attack step, an instance of the step was appended to the observation database. To select an action for the time step, the current database was fed to the defender agent, producing a single action in accordance with its policy. When there was no recorded change to the database between two time steps, no action was requested from the current agent. If the agent selected an action other than waiting, the mal defense step was added to the observation database, mapped to a corresponding active response, and sent to the Wazuh server through the REST API. As in the simulator, assets with defense steps were removed from the observation along with any associations to other assets or attack steps it was involved in. If an asset that had been removed appeared in an alert at a later time, it was temporarily reintroduced to the model for a single step. The command used to run the experiments with the Vejde/MAL Simulator interface was: uv run scripts/run_on_repeat.py \--config-file configs/tired_hope.yml \--start-time "00:30" \--run_length_minutes 120 \--period_minutes 180 \--polling_time_seconds 30 \--number-runs 8 \--agent_selection.method "shuffle" \ Episode Variations Experiments were run with a set of variable factors, which were selected at random before the start of each episode. Simulated Users To test the defender agent’s robustness to noise, we ran episodes with and without the simulated user agents. Each host in the client section of is assigned a simulated user agent, which will perform actions based on a given policy. Attacker Strategy To test how the defender agent handles different attack profiles, we used the two configurations described above. The configuration determines which machines Lore prioritizes. Network Topology To test how the agents handles variations to the network topology, we randomly remove hosts, selected from “rootca”, “timereporter” and “print”, from before the episode. These machines are not part of the list of machines Lore is directed at with the “Guided” policy. Data Description What follows are descriptions of what the data contains. The file [utils.py] includes a number of functions to faciliate easier manipulation of the data. Also included are [cia_values.json] that lists the Confidentiality, Integrirty and Availability prioritites for each host as well as [to_agent_ip_mapping.json] and [agent_to_subnet.json] that lists which network segments hosts belong to. Defender Agents These are the identifiers for different defender agents that appear in episodes: WhackAMoleAgent: Heuristic agent that enables defenses on assets with observed attack steps. NOPAgent: Defender agent that does nothing. nebulous-snail-417: Vejde agent trained with no false positives. bustling-doe-36: Vejde agent trained with false positives. Episode Folder Contents Episodes were generated between 00:00 and 21:00 each day in three hour intervals. The 00:00 episode always uses the "do nothing" defender, meaning that Lore can run uninterrrupted. Each episode folder contains a number of files from Wazuh, Lore and the ADS Monitor. The main files we have used for analysis are: wazuh.json: Exported Wazuh data, in the OpenSearch document format. summary.json: Episode summary from Crate. Contains info about starting times, shutdown machines and lore configurations used for the episode. blue_agent_recording.json: Blue agent observations and actions from the Monitor, expressed in CadsLang attack and defense steps. lore.json: Full Lore action log. The following files are included for debugging purposes: lore_knowledge.json: Lore knowledge log. Contains assets Lore discovered. sved_log_*: Simulated user agent action log. Contains actions performed by the simulated users, if they were present. lore_standard.json: Shortened Lore action log. episode.log. Log file for experiment setup from Crate. Summary File The file summary.json is a processed summary of all the episodes, combining fields from the files listed above, with some derived data like the blue agent return. The following list attemps to briefly describes each field: episode_id: Episode identifier composed of the date and Crate start time. schedule: Times crate was reset and Lore _should_ start. time_start: Time lore actually started. This was set to half an hour after the Crate reset for all episodes. time_end: Time lore ended. lore_scenario_id: lore scenario id. 1-4. Combination of entrypoint and priority. lore_scenario_instance_id: Not relevant. bot_game_id: Not relevant. user_agents: Whether simulated users were active or not in the episode. remaining_goals: Goals that Lore has been given but not achieved in the episode. These are hosts that Lore attempts to reach. all_goals_met: Whether Lore obtained all goals. goals_remaining: Names of remaining goals. points: Points Lore received according to its own scoring. blue_agent: What blue agent, out of those listed above, that was used for the episode. blue_agent_start_time: When the blue agent was started. This was set to half an hour after the Crate reset for all episodes. blue_agent_return: Score of the blue agent, as measured by the Monitor interface. Note that this will count false positives as real alerts, and is therefor not a reliable metric. lore_entrypoint: Machine Lore used as entry. Lore will always start from flightlogs, but will prioritize the machine listed here. lore_priority: Lore policy. malicious_users: Whether users turn on machines or not. This can be considered a bug, so episodes where this is true should be treated as such. block_blinds_blue: Whether the block IP actions also blocks machines from sending alerts. This can be considered a bug, so episodes where this is true should be treated as such. shutdown_machines: Machines that were turned off for the episode. blue_agent_attack_penalty: Blue agent penalty for being attacked. This is calculated as the sum of C + I priorities for each compromised host per minute they are compromised. A machine is considered as compromised when Lore has run an elevated shell command successfully. blue_agent_defense_penalty: blue agent return only for defending. The penalty is the A priority for an asset. This is also calculated per minute. Shutting down a host incurs the full penality, blocking IP traffic half. blue_agent_score: Blue agent return. Sum of attack and defense penality. blue_agent_compromised_hosts: machines lore gained an elevated shell on. blue_agent_defended_hosts: assets that had defenses ran on them. blue_agent_episode_length: how many steps the blue agent took. Dependent on the polling rate of the monitor. The wall time of an episode is always two hours. 2026-04-01 2026-04-01 2026-05-10 2026-05-10 Start of collectionStart of collection End of collectionEnd of collection Access to data through an external actor. Åtkomst till data via extern aktör.