This write-up is intended to be a bit of a behind-the-scenes glimpse at the making of the 2022 NSA Codebreaker challenge, in hopes that it will be of interest to the community and, perhaps, future Codebreaker leads or designers of other competitions. I don’t intend this to be a walkthrough of how to solve the 2022 Challenge. There’s already a number of great write-ups in the #2022-writeups channel on the Discord, but I may touch on some alternate approaches that I haven’t seen anyone mention there.
Up front, I need to make it clear that Codebreaker is far from a one-man show. I was the front-facing tech lead and fielded all your questions, but behind the scenes none of this would be possible without the efforts of a wonderful group of people. In the interests of anonymity (we are, still, a bit of a shadowy government agency at heart) I shouldn’t name everyone, but know that there was a team of really smart folks who put together this year’s challenge for you. As I like to say, anything you enjoyed about the challenge can be credited fully to them, and anything you dislike is my fault.
Since they’re public on this Discord, I’d like to especially thank Michelle and Bev from our Academic Engagement office, who not only get answers for all your career questions but also handled all the administration, financial and publicity that I’m frankly no good at. It’s no secret at all that there would have been no 2022 Challenge without their hard work.
Our design process started at the end of February, when a virtual group of volunteers got together to hash out what our theme would be for the 2022 Challenge. The general design process is fairly straightforward: we started by looking at recent Cybersecurity news stories for an interesting theme without too much overlap with our recent Challenges. Ransomware is, unfortunately, a perennial news story, and we thought the ransomware-as-a-service angle would let us cover some ground we hadn’t before, particularly with web hacking.
Once we settled on an overall theme, we put together a few proposals for the plot of the challenge. That includes both what happened in the fictional scenario that you’re investigating, and the path your investigation will take. We knew from the beginning that the final task would involve recovering the victim’s files, and it became pretty clear that there were multiple logical starting points for your investigation. Since we want tasks to (as much as possible) build on what came before to tell a whole story, we decided on this year’s unusual bifurcated structure.
With the overall design of the challenge figured out, we settled on a proposal for a set of nine tasks, which carried through essentially unchanged to the final version of the challenge. The next few months were taken up with developing the generators for the artifacts for the Challenge. A general design principle of the Codebreaker Challenge is that each student gets a unique set of artifacts. Although the underlying problem is the same for each student, separate sets of artifacts means that students can’t just trade answers and run up the leaderboard. Since our goal was to have a coherent story for each student, we had to take some care to make sure everything was consistent across a set of artifacts.
We settled fairly quickly on building Docker containers to generate the actual artifacts, which let us ignore a lot of the complexity of developers working on disparate systems with different system packages installed. Each student gets a set of about 20 randomly-generated variables that define everything that we needed constant across different artifacts (like the time of the intrusion, the usernames for various actors, and so on), which are passed to the Docker containers as environment variables. A master orchestration script pre-generates all the artifacts for the challenge. We decided to build 10000 sets this year, which took up over 200 gigabytes of storage space. We could have reduced that somewhat (for example, each person has an identical-but-separate copy of ssh-agent), but storage is cheap.
The final step before taking the challenge live was internal testing, which took place over a couple of weeks in early July. We had a great team of volunteers from across the Agency with widely varying skill levels, which was perfect for testing this challenge. Our testers provided great feedback and found a number of embarrassing bugs that, thankfully, never made it to production. Playtesting is, I think, the key secret ingredient to running a successful event like this. Don’t cut corners here.
At this point, I’ll give some quick thoughts on each of this year’s tasks. Starting with Task 0: join the Discord. This task proves what I was just saying about playtesting. For anonymity reasons, we didn’t ask our playtesters to join the Discord, but just assumed that the same thing would work this year as last. Except for the fact that I forgot to actually generate the backend file that made the task work, resulting in a deeply embarrassing first hour patch. Always test!
Also, a surprising number of participants failed to actually read the entire email they got from the system that included the Discord link. Come on, read the whole email, guys.
This task asks the student to identify a compromised account from the company’s VPN log. This task suffered a bit from the random generation process. If I were re-building the challenge, I’d try to eliminate spurious anomalies by giving each user a daily schedule that they approximately stick to, with some noise added on top. As it stands the huge variation in session times makes the intended answer (the user with two simultaneous sessions) harder to pick out than is probably appropriate for the first real task.
I’d also consider swapping the “A” and “B” branches to put the current B1 first. The order that they wound up in is accidentally reflective of original proposals, which were much more VPN-centric. As the challenge evolved the VPN pieces were mostly dropped, so the B branch is more thematically connected to the main body of the challenge than the A branch. And B1 has a more obvious entry point than A1.
This was our token forensics task, another skill that we wanted to include but didn’t want to re-tread too much ground from the 2021 Challenge. We considered having students analyze a filesystem image, but making that work in the Docker-based framework we had settled on proved to be more annoying than it was worth.
Wireshark proved to be the major pain point here: lots of students had trouble getting Wireshark to decrypt the TLS session, and for some unfathomable reason saving the results of a Follow > TLS Stream saves only the packets that have been loaded into the window for display, rather than the entire stream. I have to think that’s a bug, rather than intentional behavior. Tshark (the command-line version of Wireshark) works much more reliably.
We expected that many students would load the key, do a Follow > TLS Stream, see a printable string that looks username-ish, throw that in the answer box, and call the task done. Those of you who made it to Task 9 know that skipping the full analysis was a mistake.
I was pretty surprised how happy the lawyers were to let me register “unlockmyfiles.biz” and “ransommethis.net”. I guess there’s no rules against the government registering shady-looking sites.
This task asked the students to find the login page for the ransomware-as-a-service site, with pretty much no further directions. This proved to be quite challenging for folks, although I’m not sure how we could have signposted it better.
We knew right away that we wanted to disclose the source code for the website, to make the final tasks more tractable, and this was the obvious place to do it. In testing, we were surprised to discover that dirbuster and sibling tools don’t seem to try “/.git” as a directory, even though source code disclosure by exposed .git directory is a fairly well-known and common vulnerability. The warning to steer clear of those tools was for exactly the reasons we said: they don’t make that guess (even though they probably should), and several students got themselves temporarily blocked by hitting AWS’s automatic DDoS protection features with aggressive scanning.
To try to better signpost the solution, we added the x-git-commit-hash header, with the actual git commit hash for that student’s repo. We hoped that, along with the clue in the task description, would spur folks to google for things like “web site git exposure” and find the right technique. Since we just made the header up out of whole cloth, we knew searches directly for that name wouldn’t go anywhere useful, but the hope was to get you thinking.
If you look for it, the code to inject that x-git-commit-hash header isn’t in the repo you downloaded. It just appears by magic. The code that you download is a real, slightly modified, version of the actual code running on the RaaS website. The major differences are some additional mitigations to the vulnerability you exploit in Task 8 that prevent leaking anything actually sensitive, and a bunch of spots where we look up per-student values in a database, rather than having everything hardcoded. We were also careful to prevent students from doing anything on the live site that changes state: there are actually several backend copies of the RaaS server running behind a load balancer, and we side-stepped the problem of synchronizing state among them by making sure students could read but not actually write anything server-side.
I’m a little bummed that no one noticed my hint in the task title. “Getting deeper.” “Getting”. Git it?
The eagle-eyed among you will have noticed that this task was originally Task A3, and narrative-wise probably belongs there. But we decided that with the massive step up in difficulty, it deserves to be in the main path. We know that 6 and 7 are definitely easier than this task (hence the point values), but narratively this is where it had to go. Even with that the story is a bit of a stretch, but I think this is such a good challenge than I don’t mind the kind of dumb story.
Many of you will have found Piergiovanni Cipolloni’s excellent blog post, with his method for extracting private keys from ssh-agent. When I first ran across it while building this task I was afraid that we’d have to scrap the whole thing, but luckily the script he provides is dependent on knowing the key comment. Randomizing that saved the task, but I know at least a few of you guessed that the weird base64 string in the core dump must be the key comment and got it that way. I wonder if he’s figured out why his blog post has had so much traffic in the last few months?
One surprise I had was that many folks didn’t seem to realize that ssh-agent is open source. Using the source code as your guide is much easier than trying to just dump the binaries into Ghidra and start poking.
Many of the write-ups seem to start with another blog post’s trick of using the socket name as a signpost to find the idtable structure. If you didn’t see that trick, you can also find references to the idtable structure in the identity-management functions in ssh-agent. That makes things a little tricky, since the code is fairly heavily optimized, with lots of inlining, and you have to manually rebase the binary to account for the ASLR offset when the core dump was taken. But it’s completely do-able. Similarly, re-implementing the key shielding algorithm in Python (or another programming language) isn’t hard, and is probably easier to understand than directly calling functions in ssh-agent from gdb.
This one was pretty straightforward: using the cookie they decrypted at the end of Task 5, students needed to forge a non-expired token that allows them to authenticate to the server. I know a few folks were stuck on Task 5 so long that they forgot they have source code (which makes this far harder), but other than that there’s no tricks here. We debated whether to allow alg=none or not, but decided even the morons who run this site aren’t that dumb. You’d have to be completely incompetent to make that mistake.
Here’s our SQL injection task, although we didn’t bill it as such. We knew with a web-hacking theme SQL injection was an absolute must. There was some debate over how hard to make it: can we set up a completely blind SQL injection, so users are forced to do a binary search? We eventually settled on forcing everything through a conversion to an integer, meaning that you can’t just ask for the admin’s secret value and get the answer. We wanted to make things at least a little tricky, but I don’t think most folks found this too hard.
I know a few students went down the path of trying to guess the admin’s password, rather than forge another token. It’s a reasonable approach, but we didn’t really give you any basis for guessing their password. They do have a password in the database, but it’s strongly random and I don’t expect anyone will be able to brute force it before the heat-death of the universe. I’m not even 100% sure that the login function works on the website. It’s not intentionally broken or unimplemented, but it’s also not tested.
Task 8 is intended to be the more challenging reverse engineering task for this year’s challenge. Most students who have done some reverse engineering are used to looking at C programs, and may have seen C++, but we expected Go would be a new challenge for most participants. Go binaries look extremely different from C binaries. They’re statically linked, so typically quite large, use a different calling convention that varies from version-to-version, and lay out strings and data very differently from a typical binary.
On the other hand, Go binaries also have lots of built-in information roughly equivalent to traditional debugging symbols, which can’t be stripped out without breaking the binary. That means these binaries should be easier to reverse engineer than a traditional binary, if tooling supports it. When I started building this task, Google had just released Go 1.18 which changed the layout of these symbols enough that existing tools no longer functioned. I originally wrote a quite simple program, with the hope that students would have to manually parse the internal data structures to recover function names. Tooling quickly caught up, so we had to go back and add more obfuscation to the program.
We did try to pull a small trick by asking for a base64-encoded key, and including a few tempting base64 strings in the binary. None of these were the actual key, although they do feed into the key derivation process.
For the final task, you’re asked to decrypt one of the victim’s encrypted files and save the day. We intentionally don’t give a lot of guidance on how to do that, but leave it up to students to put together all the pieces scattered across the previous tasks that lead to a successful solution. The intended clues were:
We went back and forth on how hard to make the brute force, but ultimately decided that too large a keyspace wouldn’t really add much to the challenge, except in an optimization sense.
Huge congratulations to Georgia Tech for an overwhelming win in this year’s challenge, but even more so congratulations to all the participants, no matter how far you got. This challenge is not intended to be easy, and just taking it on is something to be proud of. I hope you’ve all had some fun and maybe learned something – I absolutely learned a ton while putting this together for you.
If you had fun with this challenge, check out our job postings at https://intelligencecareers.gov/nsa. I can’t promise that everything in the challenge was completely realistic, but the same sorts of skills you used for this challenge are applicable to many of our jobs. And, if you join us, you could get the opportunity to run a future Codebreaker Challenge.
I’ll be handing over the reins for next year’s challenge to a new lead, but I’ll be rooting for you all and checking in on the Discord occasionally. If there’s any questions, feel free to ping me there or at email@example.com.
2022 NSA Codebreaker Challenge Lead Developer