For a very small instance with only a couple of concurrent users a CDN might not make much difference. But if you take a look at your web server logs you’ll quickly notice that every post / like / vote triggers a storm of requests from other instances to yours, looking up lots of different things. It’s easy to imagine how quickly this would overwhelm an instance once it gets even a little busy.
One of the first web performance tools people reach for is to use a CDN, like Cloudflare. But how much difference will it make? In this video I show you my web server logs before and after and compare them.
The short answer is – before CDN: 720 requests. After CDN: 100 requests.
Usually just turning on a CDN with default settings will not help very much, you’ll need to configure some caching rules or settings. By watching your server logs for a while you’ll get a sense for what needs to be cached but check out mine for a starting point:
All these are frequently requested on my instance. Depending on the fediverse platform you have installed, you’ll probably see different patterns and so need different caching settings.
Beware of caching by URI Path because often fediverse software will return different data depending on the Accept header that the requester sets. For example, on PieFed and Lemmy instances a request by a web browser to /post/123 will return HTML to show the post to someone. But when that same URL is requested with the Accept: application/ld+json
header set, the response will be an ActivityPub representation of the post! You don’t want people getting activitypub data in their browser and you don’t want to be serving HTML to other instances. Once you spot a URL you want to cache, use a tool like Postman to set the Accept header and make a fake ActivityPub request to your instance and see if you get back HTML or JSON.
Another problem that can happen is that often a response will vary depending on whether the viewer is logged in, or who is logged in. If you can figure out how to configure the CDN to pay attention to cookies or whatever headers are used for Authentication by your platform then you might be able to cache things like /post/*… I couldn’t.
The things I’ve chosen to cache by URI Path above are ones that I know don’t vary by HTTP header or by authentication.
Although we can’t use URI Path a lot of the time, we can cache ActivityPub requests by detecting the Accept: allocation/ld+json header:
This will cache all ActivityPub requests, regardless of URL. People browsing the same URLs as those used by ActivityPub will be unaffected as their requests won’t have the special HTTP header. I used a short TTL to avoid serving stale data when someone quickly edits a post straight after creating it.
There seems to be a deep vein of optimization here which I’ve only just started to dig into. These changes have made a huge difference already and for now my instance is under very little load so I’ll leave things as they are. I look forward to learning more about this in future.
@piefedadmin โoften fediverse servers return different dataโ
You mean that the servers donโt set Vary headers for the headers that can change per response ???
That sounds like some tickets need to be filed.
PieFed doesn’t set that header because TIL about Vary. Thanks!
I would like to see fediverse platforms introduce a random amount of delay to their requests so they don’t hammer origin servers all at once. It really wouldn’t matter if a post was ingested 0 to 30 seconds later, would it?
Also, cache the results of queries for longer. Most of the requests are looking up data that probably hasn’t changed since last time it was queried.
Also, use a caching mechanism like ETags or something so origin servers can return HTTP 304 to indicate it hasn’t changed.
Simple stuff like that would make a huge difference.
[…] explores how much difference it makes to add a Content Delivery Network. It also added support for audio […]