TarPit – a Proof-of-Work HTTP Ratelimiting Scheme

I’m pleased to announce that my employer, SeaWorld Parks and Entertainment, has graciously granted me permission to open-source a research and development project that I wrote on their dime called TarPit: a proof-of-work-based API rate limiting .NET Core middleware.

https://github.com/seaworld-parks-and-entertainment/TarPit

It’s not a protocol, it’s not a library (technically it’s a library in the sense of the source code being open for use and reuse, but I don’t really encourage that at all), it’s a notion that’s been floating around for some time in several different contexts that I finally scraped down the cycles to actually implement. I’m not a .NET guy (I’m barely qualified to call myself a software guy), so when I found myself with some breathing room, I undertook to boot up on the CLR in earnest, and C# in particular. Some goals, in no particular order: write a load of C#, write tests, work on something intellectually stimulating, do something that had never been done before, solve a real problem, familiarize myself with the .NET Core HTTP stack, familiarize myself with .NET Core middleware nuances, lean into Dependency Injection, and wrap my head around the popular (in .NET-land) Clean architecture. I also wanted to get to a point where I could load test the same application with both a Redis and Postgres data store, but decided to park and publish this code before I got to the load testing.

Overview

Ratelimiting API and website access is tricky. Folks can trivially overwhelm API and website providers merely by deploying money (and buying a zillion IP addresses), and in the case of state-sponsored actors they don’t even need to spend that much money. Therefore our goal is to reduce the cost to systems operators for running a rate limiting system, and to impose asymmetrical and arbitrary costs on our attackers.

System Design

We are to build a system that will track calls from attackers, calculate the number of calls from an individual attacker within an arbitrary lookback window, and increment or decrement the difficulty their supplied nonces have to hash to. If the number of calls in the lookback window exceeds the allowed count for that caller, increment the difficulty for that caller. If the number of calls in the lookback window is below a specified “cooldown” threshold, decrement the difficulty for the caller by one. Communicate the current difficulty to callers in response headers. Retrieve nonces and caller identifiers from request headers. Store nonces for each caller to prevent replay attacks.

Glossary

  • Nonce
    An arbitrary value provided by a caller.
  • Hash
    Output of a one-way (or “hash” or “hashing”) function
  • Difficulty
    Given a one-way function (commonly called a hash function) and an input, the difficulty determines how many leading zeros the OWF’s output must have. For example:

    • difficulty=0
      Hash of a nonce does not have to have any leading zeros. Ex: cfd3….
    • difficulty=1
      Hash of a nonce must have at least 1 leading zero. Ex: 0fd3…

Implementation

Assumptions

  • All callers are identifiable
    Productionizing this assumption may prove troublesome. Some APIs are intended to be used by the public, and we would insist on all callers having a unique ID. To support this particular use case, we could expose an endpoint for signup, with an out-of-the gate difficulty level high enough to discourage folks from spamming it to parallelize their botnet throughput.
  • Design for distribution
    I expect nodes running a productionized version of this library to be running concurrently, which demands a datastore that all of the edge-tier nodes talk to together.

Overview

In the context of the .NET Core stack, we will implement a middleware, an abstraction layer over storage implementations, and some lightweight domain objects so the compiler can help us reason about our project.

The domain namespace merely exposes interfaces for IClock and ICryptographicHasher, abstractions that are used in the Library namespace, but not exposed from it.

The storage abstraction layer (“Infrastructure” in .NET Clean architecture parlance) will abstract over storage, and give callers of the infrastructure layer the ability to ask about:

  • the number of calls made within an arbitrary time window
  • the current difficulty for a caller

The middleware itself will be responsible for:

  • extracting caller identifiers from HTTP requests
  • computing the hash of a nonce
  • comparing difficulty of provided nonces with minimum difficulty
  • updating difficulty for callers
  • adding difficulty headers to responses
  • blocking responses from callers that exceed the rate limit

Detailed review

Nitty gritty of implementation.

Tests

I’m reasonably pleased with test coverage on this project. I will admit (begrudgingly) that the Clean patterns make it very clear when one is writing unit vs integration tests: if testing the root node of the dependency graph, it’s clear that you’re writing integration tests; any of the leafs: unit tests.

I’m particularly pleased with the abstract test suite that both the Postgres and Redis test suites inherit from. There is one encoding of library behavior, and the test suites for both backing storage implementations inherit from that and add a few of their own tests to cover edge cases.

Pretty pleased with this testing setup!

Domain

The Domain namespace is pretty lightweight! It contains 2 interfaces (with 1 implementation apiece):

  • ICryptographicHasher
    Very simple interface, merely leveraged to shove the hashing system through the DI stack.
  • IClock
    IClock is an interface that I’ve been dragging around with me since I worked with Benjamin van der Veen many years ago, and he taught me the pattern. We were working on a system that needed to schedule notifications and then fire those notifications at some point in the future. Those notification events had the ability to reschedule themselves, based on complicated criteria. Everything had to be done in-process, and he’d designed the system such that it didn’t need to poll the DB, merely read current state at startup and instantiate its timers.We used the Java Timer at the time, because it’s there and battle tested, but I was a bit baffled at the time about how to go about actually testing this system.The ever-wise van der Veen breezed through my workspace around that time, and said something like this:
    “Go make an interface that handles scheduling timers and cancelling timers and understands how to talk about their callbacks, and implement that interface with a simple ConcreteTimer class that itself maintains timer dispatching and timer cancellation. Then, over in the test harness, implement the same interface in MockTimer, and add some methods to set the current time and to advance the current time by arbitrary amounts. Now you’ve got a completely testable timer that handles system Timer objects in production, and otherwise you can control very precisely in the test harness. Also, write tests for the MockTimer so that you are certain your implementation is correct.”All of which was great design and advice, and I’ve dragged it along with me forever since. Never use a system Clock or Timer (or many other things, really), instead wrap them and provide an interface you can override in the test harness.You can see the vestiges of that system in IClock, Clock and MockClock: IClock defines a simple interface of one function, now, that returns system now. MockClock implements the same interface, augmenting it with two additional functions to set the current time and advance the current time by arbitrary amounts. This is one of those tools that builds towards compounding returns in software development, since you can now write strong tests about the time-domain behavior of your code.IClock and MockClock, as we’ll see, are how I wrote tests for this system that confirm its ratelimiting behavior as the arrow of time moves ever forward.

Infrastructure

In the Infrastructure namespace, you’ll see 2 main namespaces: Postgres and Redis (there’s also a Migrations directory that should really live in the Postgres namespace, but let’s not concern ourselves overly with that). At a high level, the Postgres implementation relies on Postgres’ native timestamp and B-Tree indices to do as much calculation DB-side as possible, while the Redis implementation is forced to pull a potentially very large datastructure into memory to run calculations.

Infrastructure is responsible for answering 2 questions:

  1. what is the current difficulty for this caller
  2. how many calls has the caller made in an arbitrary time window
    Technically this last requirement could be slimmed down to “how many calls has this caller made in the last N time units”, but I didn’t find that that tightening of requirements bought me any performance improvements given my indexing strategies.

Redis

Poor, beloved Redis. How are we to abuse a high-performance key/value store into doing anything like the kind of operations we need to do here? If you go and look at everyone else’s rate-limiting implementations, they tend to rely on atomic incrementing and appending to existing collections of call records in Redis-like “databases”. Mostly for hoots and hollers, I decided that it would be more entertaining to implement the timestamp lookup with Redis by using a Trie tree to store callers call histories.

The Trie is implemented as a prefix tree lookup, where the current time is modeled as a Unix millis-since-epoch and then serialized. This gives us the ability to find nodes in O(L) time (where L is the length of the timestamp string), and so therefore to sum the count of nodes between two other nodes very quickly. Unix epoch millis monotonically increase, so we get strong guarantees that a timestamp B after timestamp A will always insert in a trie to the right of timestamp A. At that point, all we need to do is breadth-first walk the rest of the tree to the right of A and count the number of nodes to the right of and left of B until we have the answer for “how many calls did we hear in between A and B”.

This implementation has really bad concurrency implications. Node A can read the serialized Trie, B can write to it, and A will have no idea that this ever happened. I attempt to address this by reading the trie into memory, writing the new call into the datastructure, and then saving that datastructure back to Redis as quickly as possible, but with this architecture we just can’t solve concurrency.

Postgres

The Postgres implementation, in comparison to the Redis implementation, is awesomely simple. We record calls in a table of calls with an FK over to a table of callers, indexing the calls with a compound index on caller and timestamp (considering cardinality to get the best performance out of the index). All of the heavy lifting happens DB side in this implementation, which gives us several opportunities to squeeze a bit more performance out of the DB by using prepared statements, stored procedures, or views depending on what profiling would tell us.

Tradeoffs

Redis, everyone will tell you, is way faster than Postgres. I originally intended this project to give me a bit of a lever to benchmark the two systems, but it turned out that the ratelimiting algo itself is too efficient to make this a useful benchmark, as difficulty climbs much faster than an attacker can beat to impose load on the database.

Postgres is relational, and that, some folks say, is just too much of a cognitive burden. I don’t really buy this argument, as relational datastores are a core component of software development and I generally try to get everyone who works with me to at least some level of familiarity with them if only to patch over the hole in their knowledge. They’re just hard to beat in terms of data modeling, persistence, and a lingua franca for all of the above. Plus, “choose boring technology”.

Library

Library has all of the boring HTTP concerns. What header names to use, how to identify callers, how to signal to callers that they’ve exhausted their quota, and so on. This is the area of the code that would need the most attention during productionization; as we would want to make caller identifcation, hash selection, header naming etc all user-configurable.

Takeaways

DI is good. “Clean Architecture” feels a lot like “dependency graph should be directed and acyclical”, but having the paradigm out there is useful as folks are more willing to listen when you say stuff like “Clean Architecture” and more prone to falling asleep when you say “dependencies should form a directed acyclic graph”. I still have just the barest grasp of sophisticated programming with generics (and have a bit more sympathy for the original Go team’s attitude that “yagni, bro”.

Areas for Improvement

  • Hash function should be configurable at runtime
  • HTTP header names for nonces and difficulty should be specified at runtime
  • User identification should be refactored for configurability at runtime

Coda

I enjoyed writing this project inordinately. It took a concrete problem we have at work, and completely solved it (but so did the teams involved, well before I wrote this, and at far lower implementation and complexity cost than this solution would entail). I’m grateful to my colleagues Richard June and Brandon Schultz for their critiques and suggestions and input on general .NET project structuring; to my friends Michael Trinque and Benjamin van der Veen for mentorship and collaboration over the years, and in the case of this project a relentless willingness to ask “why” and question every architectural decision (until quite frequently they got a frustrated “because I want to!” out of me, which is not the ideal response out of a software engineer). I’m also grateful to my boss Kou Raghavan and SeaWorld’s CIO Jay Smith for supporting me in open sourcing this project, and Jeff Schwartz at SeaWorld for being a curious listener and open to doing new things with company intellectual property that yields us no competitive edge and would benefit the rest of the world by publishing.

Leave a Reply