Improving performance by understanding user behaviour

Typically when you’re trying to optimise a system, you aim to do one (or more) of these things:

Parallelize work
Make the logic faster (e.g. by using better algorithms)
Do less work

But there is another way, by exploiting the user’s interaction patterns. This won’t necessarily make your program faster on paper, but it will feel faster, which is all that matters to the user anyway.

To understand this, let me first give a historical example, then an example from my own work.

Instagram’s really fast image upload

When Kevin Systrom and Michel Krieger were building Instagram, they used a clever trick to make it seem like posting (the quintessential user action) was near-instant. An instagram post consists of a photo and usually a description. When a user would select a photo, they were also very likely to write some description for that photo. This description could’ve been short or long, but the point is the user would take some time to come up with a description before posting. Noticing this pattern, the engineers decided to take the image the user selected from their gallery, and upload it immediately to their server in the background, if the user decided to cancel the post, then they could just remove the photo no problem, but if the user would write the description then click post, there was a very high likelihood that by the time they clicked “Post” the image was already fully uploaded, so the post itself would just be a light payload of text, and thus would finish extremely fast.

Predictive Prefetching

At work, I was working on a part of the UI for the product that renders a tree-like directory explorer, and data for this component comes from an API. The directory contents are lazy loaded of course, but there could be hundreds of root-level directories that have to be loaded at first. This leads to a loading screen most of the time. But, let’s think about how this UI is even triggered. The user has to first open a panel, scroll down to this option, then click a button to pop up the directory explorer. So, what we can do is one of two things, we either prefetch the root-level directories when the button becomes visible in the viewport (i.e. the user scrolls to it), or when the user’s mouse hovers over the button. I found that doing it when hovering over the button was too late, since they would usually click before the prefetch completed, and would still see the loading UI. Prefetching when the button became visible though was very successful, most of the time leading to no loading screen at all.

Figuring out user interaction patterns

So you want to understand how your users interact with your app, and then infer what kind of unique optimisations you can make with those patterns. But how do you know what your user’s interactions with your app even look like? This is where metric aggregators come in. Providers like Statsig, PostHog, LaunchDarkly, etc. There’s a lot of them, pick any you like. I personally enjoy using Statsig for this. You simply track events you think might be of interest. There’s no silver bullet, it’s like looking for treasure, you just try different places in the UI until something interesting pops up. Points of interest might be things that trigger events, like buttons or toggles. Once you have a measurable user interaction, then decide on how to optimise performance around it. Don’t simply add something prematurely, it may hurt your UX or even worsen performance in unexpected ways. In my anecdote, I knew that optimisation would work because metrics showed >80% of users who opened that panel would also click the button to open the directory explorer.

Bonus: Backend design around user interaction

Optimising around user interactions doesn’t have to be limited to the UI layer. In the backend, a common technique used to improve performance is caching. When you cache something you give it a TTL (time-to-live) that says when that resource should be expired. You can actually make this TTL dynamic, using a sliding window technique, where the TTL resets if the cached resource is accessed. This keeps hot resources (things users access often) in the cache. So once again, look at the usage metrics, see what GET endpoints are hit most often, and consider using a sliding window TTL for those when caching.