Infinite scrolling on the web is complexity layered on top of complexity layered on top of complexity

This is something else to add to this blog post, because keeping it buried in the comments section probably would make it a bit painful to link to.

First, read this. It’s a list of things that you should think about when you decide to add infinite scrolling to your site: http://adrianroselli.com/2014/05/so-you-think-you-built-good-infinite.html

Does all that stuff sound hard? Sorry, but it’s worse.

# Database pagination is inherently hard

Here’s a use case: you want to have an API that returns a list of results. It’s not relevant for this discussion whether it’s backing an infinite scroll, or a web site with multiple pages of lists, or just an API. Breaking it up so that the user can run the query, get a partial result, and then come back arbitrarily long later to get more is inherently hard.

If you just do it with “page numbers”, and additional content is added between the initial request and the second-page request, you might end up giving the user the same item twice. A typical workaround is to base your ordering on a part of the last item, like dates or database indices (instead of https://lobste.rs/moderations/page/2, where 2 comes after 1 which is the initial page the user is shown, you use https://notriddle-more-interesting.herokuapp.com/mod-log?after=1024, where 1024 is the ID number of the last item of the previous page), but if your sorting criteria is something more complicated than `ORDER BY id DESC`, there might not be a clear replacement for page numbers. If you sort by something like `last_comment_datetime`, which is what the front page of https://meta.discourse.org does, a post might be below the page break when the user opens page 1, and then the item gets bumped to the top, then the user opens page 2, so your pagination causes a post to completely vanish.

There’s a scalability problem here, too. A site with a complicated pagination scheme can get slower and slower the farther the user dives into the far pages. https://dzone.com/articles/why-most-programmers-get-pagination-wrong wrote about how that performance impact can bite you.

# Infinite scrolling is inherently hard

But let’s assume for a moment that you’ve implemented a backend that can incrementally serve up a boundless number of items at a fixed size per chunk, without skipping or duplicating a result. Now, how do you allow the user to scroll through it? Ideally, you incrementally load it as they scroll.

This means that your app needs to chat with the network as it performs the scroll animation, and once it gets its chunk, it needs to render it (that includes line breaking and image size calculation, since you might not know how big it should be, and so it’s probably inherently serialized calculation). As all this stuff happens in the background, the scrolling animation should remain smooth. You can’t just wait until the scrolling animation ends to do the layout and rendering work, because the user doesn’t want to reach the end of your result set just to have it pop up the second they stop; we want to create the illusion that it was always there. Smooth scrolling is a soft-real-time problem, and not stuttering is important for psychomechanical reasons, not just fanciness. https://pavelfatin.com/scrolling-with-pleasure/

Giant lists also bring in navigation problems. With a giant scrolling screen, the user will probably want to be able to bookmark a certain position in it. They’ll also want smart ways to navigate it (a simple, tiny scrollbar isn’t going to cut it). And if they navigate elsewhere, you need to be able to navigate back to where they were without having to keep everything in RAM or reload everything (though your magic pagination backend should already allow you to pull in an arbitrary “page” of data in the middle). Object permanence is paramount.

You also don’t want to have ever-rising memory requirements. Android ListView, for example, shows how to alleviate some of this, because it needs to pull object reuse and a “smart” treatment of the scrollbar in order to allow scrolling through large lists without loading the entire thing into memory at once. That thing, which was designed for large lists on tiny devices, is the right thing to look at for utterly enormous lists on any device: you need to avoid leaking memory or thrashing the allocator, you need to track position in terms of items rather than pixels scrolled, and the scrolling and layout should be coupled so that recomputing stuff doesn’t cause things to jump around. There are android apps that don’t use ListView, or they do but they don’t use it correctly, but the infrastructure to do it right is there.

# The infrastructure to do infinite scrolling right isn’t there on the web

Imagine that the user skips to the middle of your giant list. Naturally, you want to download and render the middle items without downloading or rendering the ones before or after.

It’s pretty easy to imagine how you handle scrolling down: put some big gray placeholder in there and replace it with the new items once they load. It’s hard to make sure that layout is fast enough that it doesn’t stutter, but whatever; the real problem is when the user scrolls *up*.

Web pages are laid out top to bottom, and scrolling is handled by the browser. Coordinates within the CSS layout system are done in pixel counts with the origin in the top-left corner of the page canvas. Scrolling is done by moving the viewport across the page canvas. By inserting an item earlier in the DOM, you move all of the items after it. There is no way to atomically update the scroll position and the DOM at once, so whichever one you update first, you will cause a gross-looking jump when the user sees the intermediate state (offering a way to update them both atomically would likely introduce stutter, because of a lot of unnecessary recomputing values because the wrong-for-the-problem-domain coordinate system is in terms of canvas position instead of screen position, and because it could wind up freezing the scroll while the JavaScript GC runs).

So you’re stacking a bunch of incidental complexity on top of problem space that already has a bunch of inherent complexity.

# So now what?

Besides just dumping engineering time and CPU cycles into the problem, like www.discourse.org does, and still having problems with scrolling up or problems with browsers that throttle the scroll events so that it makes skimming a pain in the butt? https://lobste.rs/s/mmxkgx/so_you_think_you_ve_built_good_infinite#c_d1nyon

Why not just truncate? Give them a thousand headlines (or two hundred entire blog posts, like this site does). A typical headline is about a hundred bytes, so a thousand headlines is 100K, which is smaller than the ember_jquery payload that Discourse serves, and add the overhead of wrapping it in a link and you’ve still gone underneath what a typical sites’ size is. That’s more content than most people would ever want, and if they don’t get what they want in that amount of stuff, they can fix their search query or use the Random button.

Of course… the previous paragraph is a terrible oversimplification, because the typical websites (like Discourse) spends far more bandwidth on images than almost anything else. If you want to tag every post with an avatar, or worse, allow users to include their own giant pictures, that’s a good way to have a thousand 1MiB photographs balloon your initial page load time from 100K into a gigabyte (a thousand 1MiB avatars, and if you’re curious I checked how big the avatars on meta.discourse.org were and they ranged from 50 bytes to 3KiB depending on whether they were the letter avatars or photographs) would still make you enter the megabytes range.

So more-interesting doesn’t have images because having images would require it to either implement pagination, which gets in the way of uninterrupted reading and is difficult to implement, or infinite scrolling, which is better for uninterrupted reading but an INFINITE SINKHOLE OF PAIN to implement. There’s also the option of serving all the text but only loading the images when they scroll into view, but you still have to deal with the heavily-throttled scroll events when you do that…

4 comments

To post a comment, you'll need to Sign in.