The website requires your browser to enable cookies in order to login.
Please enable cookies and reload this page.
The selected country is different from the address stored in your account. If you continue, you will be logged out. Your basket will only be available after logging in again.
Depending on your country, different offers might be available
The agent learns basics: scan → detect vulnerable service → execute correct exploit. Rewards are given immediately.
Simulators are imperfect. They do not model network latency jitter, packet loss, or ephemeral service failures. An agent that thrives in CybORG may freeze when a real web server occasionally drops a FIN packet, interpreting it as a firewall. autopentest-drl
Defenders deploy simple firewalls and IDS alerts. The agent learns to add random delays or route through decoys. The agent learns basics: scan → detect vulnerable
Training a single robust policy requires 50,000 to 200,000 episodes. In real time, at 30 seconds per episode (optimistic for a small network), that is 1.7 years of continuous simulation. Distributed training on GPU clusters cuts this to days, but hyperparameter tuning remains an art. 000 to 200