Detecting Runtime Attacks

This project, titled "Runtime Software Attacks," was a my second semester cybersecurity project focused on implementing and detecting runtime control flow attacks, specifically buffer overflow exploits and Return-Oriented Programming (ROP). These attacks are particularly dangerous because they hijack a program's execution using code that's already present in memory, without injecting new files or modifying the binary itself. The central research question was: "How can we implement and detect runtime attacks that manipulate execution flow without altering the binary?"

Key Project Components

The Vulnerable Target To have something to attack and defend, the team built a deliberately vulnerable 32-bit Linux binary called "AAU Grader", a fake grade retrieval service. It contained two intentional weaknesses: a format string vulnerability (used to leak memory addresses and bypass stack canaries) and a stack-based buffer overflow (used to overwrite the return address and redirect execution). Two versions were compiled, one with an executable stack, and one without.

The Attacks Two distinct exploitation techniques were implemented using Python and the Pwntools library:

  • Shellcode injection: on the binary with an executable stack, custom assembly shellcode was injected directly onto the stack and executed to spawn a shell.
  • Return-Oriented Programming (ROP): on the binary with a non-executable stack (NX enabled), existing code fragments ("gadgets") were chained together to call system("/bin/sh") from libc, bypassing NX entirely. This required leaking a libc address at runtime to defeat ASLR and PIE.

Both exploits first used the format string bug to leak the stack canary and a binary base address, then overflowed the username buffer to overwrite the return address.

Detection Framework 1: System Call Anomaly Detection The first detection approach used strace to record every system call made during a program run. These logs were parsed into structured JSON summaries and sent to a web-based IDS dashboard. A scoring model compared each session's system call profile against a verified baseline, flagging unfamiliar syscalls, new file accesses, and unusual PID counts with weighted penalties. All four attack scenarios (from quietly exiting the shell, to modifying files, to running a full privilege escalation script) produced scores far above the normal threshold of ~35, with the noisiest attack (running LinPEAS) reaching scores in the millions.

Detection Framework 2: Hash Chain CFG Validation The second approach was more deterministic. Hooks were added to each function in the binary, printing the function name to stderr as it was called. After each execution, the sequence of function calls was hashed using SHA-256 chaining, each hash incorporating the previous one, producing a unique fingerprint for that execution path. This fingerprint was then compared against a precomputed list of trusted hashes. Any deviation, such as the ROP exploit looping back through main and getUsername, produced an untrusted hash and triggered an alert. Every attack scenario across both exploits was successfully detected with a clear true/false output.

Overall system overview:

Tools & Technologies

The project used Python and Pwntools for exploit development, GDB with pwndbg for debugging and offset discovery, strace for system call tracing, Docker Compose for isolated and reproducible test environments, ANGR for generating trusted control flow graphs, and Frida was briefly explored as a runtime instrumentation tool before being set aside due to scope and technical challenges.

Results & Takeaways

The project concluded that both detection methods could successfully identify runtime attacks. The anomaly-based approach was flexible and required no modification to the monitored program, but produced probabilistic scores rather than definitive answers. The hash chain approach was rigid by design, any deviation from a known-good execution path was immediately flagged, but required manual enumeration of all valid execution paths upfront, which becomes increasingly difficult with larger, more complex programs.

The main insight was that neither approach alone is complete. Combining a statistical anomaly detector (for broad coverage) with a deterministic control flow validator (for high-confidence alerts) offers a more robust defense than either system individually.

Future directions include integrating machine learning to improve anomaly scoring, adding loop compression to make hash-chain validation scale to larger binaries, and pairing the system with honeypots for more confident threat attribution.