Design Doc: Inline Preview Images

Inline image which are above certain threshold(InlinePreviewImages filter)

Pulkit Goyal, November 2011

Background

InlinePreviewImages filter delays the loading of high quality images after onload event and replacing them with low quality images embedded within the html page. It not a good idea to generate low quality images for all images as low quality images are inlined in the HTML, size of the HTML may bloat up which will have negative effect on performance. So, its very important to figure out suitable set of images which can be delayed. These set of images are called critical images. One good heuristics is that this critical images set contains all the images which are above the fold and whose size is greater than X(probably 20KB).

Headless browser is used to figure out the images which are above the fold per page.

Objectives:

Identify images which are above the fold and whose size is greater than X KB.
Input this critical set of images to InlinePreviewImages filter which delays only these critical images.

Non-Goal:

Identify client’s browser viewport.

Design:

There are various components to make this working end-to-end:

Identifying critical images:

It will generate the list of image tag indices in the DOM of the given html content which are above the fold. The advantage of getting the indices list over image urls is that layouts are more stable than images itself.

Information generated above should be cached in metadata cache with key as html page url and list of indices as it value. Cache Expiration timeout will be set to MAX(Html Page cache timeout, Y(=60) mins).

This component will also interact with headless browser to get the images which are above the fold.

Design of this class will be like this -

class CriticalImages {
 public:
   // isCritical returns the true if provided image index is above the fold and size is less than X KB.
   // This function will just look up cache and return the result based on that. It will not call headless
   // browser function explicitly.
   bool isCritical(int image_tag_index_number_in_html, int size_of_image);

   // This will look the cache for the given html url. If it exists in cache, it do nothing. Otherwise,
   // it will trigger a async request to update the cache with relevant information. This function will
   // interact with headless browser to get critical images.

   CriticalImages(StringPiece html_url, StringPiece html_content);

   private:
     // Contains list of indices of the image tag which are above the fold for html_url.
     List<Integer> indices_list;
     String html_url;
}

Change in image_rewrite_filter.cc

There is call for function isCritical() to check whether image is critical enough to generate the low res or not.

Points to Think?

It is very costly to compute critical images index for every html page via headless browser technique. First, we need a large cache to store data for every html page and second, this process is quite cpu extensive. Also, we will increase the load on the backend servers (more fetches etc.). Instead of this, can we have some approximation like do this once per site and store the result in cache for lower time duration in cache?
Another approximation, always generate low res for first A(~5) images in the DOM whose size is larger than Y(~20KB).

Inline images which are below certain threshold

Inlining smaller images is a better technique to save the extra round trip. However, it makes the image uncacheable (because the HTML is typically uncacheable), and also bloats the HTML, possibly slowing HTML content below the inlined data. It is clearly evident from Matt’s analysis that inlining below certain threshold is advantageous. But displaying images below the fold right away is not really useful to the user, since the user is unlikely to notice the improvement, and we have lazyload filter to defer such images.

Plan is to use headless browser to determine the images which are above the fold and whose size is less than some threshold and inline them.

There are some corner cases which has to be considered:

If above the fold information is not present, then we will skip this check and inline all the images which are inlined, as we do today.
In css, we will inline all the images which are below certain threshold because css is usually more cacheable compared to html.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly