Matthias Nehlsen

Software, Data, and Stuff

Weekly Update: PageSpeed Insights, optimizing Octopress & more Clojure

08 September, 2014

In this weekly update, I will discuss how I turned the load times for this Octopress-powered blog from terrible to pretty decent. In PageSpeed Insights numbers: before the optimization 58/100 for mobile and 77/100 for desktop; after the optimization 94/100 for mobile and 96/100 for desktop. More concretely: on a lousy mobile connection, the load time improved from 32 seconds to a mere 5 seconds. Now we're talking. You would presumably not have waited for 32 seconds, and neither would I. Also, I have a status update on the Clojure version of BirdWatch.

Making this page load fast, even on a pre-3G mobile connection

Some time ago I attempted to open my blog on my smart phone and, to my dismay, it took like forever to load. I noticed that I did not have a 3G connection at the time but come on, you should be able to open the page even if you only have an Edge connection with decent signal strength at your disposal. I was sad. Then I ran Google's PageSpeed Insights and that tool confirmed that things weren't rosy:

Red for mobile. That's exactly how I would describe my previous experience. Now, after a couple of simple changes, here is how things look now:

Not only does that look substantially better, it also makes all the difference in terms of user experience. I subjected a friend of mine to a tiny experiment involving his smart phone. The blog had never been loaded on it before, so certainly nothing was cached. We switched off the Wifi connection and disabled 3G so all that remained was four bars of an Edge connection. Initially, we loaded the new and optimized version, and it took a mere 5 seconds until the page was visible and properly styled, except for the right web font. His reaction was "wow, that was fast" considering that we were on a really sluggish network connection. Next, we opened the old version with none of the optimizations, and that took a prohibitive 32 seconds. From half a minute to 5 seconds, that is a hugely desirable improvement. Let's now have a look at what was necessary for this triumph over the intricacies and pitfalls of speedy web page delivery.

Inlining the CSS / above-the-fold content

One of the complaints that PageSpeed Insights uttered was that the above-the-fold CSS was in a separate file. Above-the-fold is the portion of the page that needs to be fully loaded before any rendering whatsoever can happen. You want this above-the-fold portion to load as swiftly as possible because any delay here will keep the browser from rendering the page altogether, which of course means that most people are leaving rather than staring at a blank page for ten seconds or longer.

Funny enough, I think I read somewhere that people tend to be even more impatient on mobile devices, despite the slower network connection to begin with. And that makes sense. On desktop, I typically have twenty or more tabs open anyway. If something doesn't load immediately, my attention will either move to another application like mail or to another tab. Good for a page if I divert my attention to checking email; then, at least, I will see the page once I come back to the browser. Another tab is worse as I probably won't come back in a timely manner or ever. But at least, there's a chance. On mobile, though, once I'm gone, I'm typically gone for good.

In order to not hold up page loading by fetching the screen.css and being penalized with an additional round trip, I embedded the entire CSS in the header of each HTML file. While that incurred an extra 39KB, in the compressed files the difference was a mere 7KB. This extra amount of data certainly loads faster than the extra round trip would take. This is particularly true for pre-LTE mobile which is notorious for long ping times. However, embedding all the CSS only works up to a certain size. While I don't know the threshold, there comes a certain size where PageSpeed Insights starts complaining. But I suppose I can consider myself lucky that the resulting file size fell within the range that is deemed acceptable. Otherwise, one would have to figure out which parts of the CSS are essential to the initial rendering and then only embed that, with the rest loaded at the bottom of the HTML body.

Nginx instead of hosted page

Before, I was using a hosted web page where I had no real influence over how the files were served. Specifically, I had no control over HTTP compression settings, ETags or HTTP caching. In addition, it was also increasingly annoying to update the content because the only available method was FTP. When I got started with the blog, that was bearable, but with an increasing number of files, specifically images, it started to take a few minutes. What I really wanted instead was to use either rsync or git. I had a server already (the one used, for example, for serving BirdWatch) with nginx running, so the first thing I did was move my blog over there and reconfigure the domain's DNS settings. Here is the section of the nginx.conf that is now responsible for serving the blog:

user www-data;
worker_processes 4;
pid /var/run/nginx.pid;

events {
  worker_connections 15000;
}

http {
  include       mime.types;
  default_type  application/octet-stream;
  charset UTF-8;

  gzip_static on;
  gzip on;
  gzip_proxied any;
  gzip_types text/plain text/html text/css application/json application/javascri
pt application/xml application/xml+rss text/javascript;
  gzip_vary on;

  server {
    listen       80;
    server_name  www.matthiasnehlsen.com;
    return       301 http://matthiasnehlsen.com$request_uri;
  }

  server {
    listen       80;
    server_name  matthiasnehlsen.com;
    root /home/bw/octopress-blog/public;

    location / {
      autoindex on;
    }

    # Media: images, icons, video, audio, HTC
    location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$
 {
      expires 1M;
      access_log off;
      add_header Cache-Control "public";
    }

    # CSS and Javascript
    location ~* \.(?:css|js)$ {
      expires 1y;
      access_log off;
      add_header Cache-Control "public";
    }
  }
}

I'm no expert in the subject of nginx configuration, but the above seems to be working well for what I am trying to do. If you are more knowledgeable and spot any nonsense in there, please let me know. Note that www.matthiasnehlsen.com is forwarded to the same host but without www. in front of all URLs. This is to make Google happy as it would otherwise index both versions as separate entities and thus potentially dilute the rank at which the page appears in search results.

Expiration settings

For the configuration of nginx's expiration settings, I used html5-boilerplate's server-configs-nginx project as a template. The suggested settings in there worked well and I got no further complaints from PageSpeed Insights about caching of any resources that are under control of my nginx server. Obviously, there is little I can do about resources served from elsewhere.

Gzipping the content

The configuration above enables both static content compression and on-the-fly compression. Under heavy load, static compression is preferable where nginx serves the gzip version of a file, should it exist with the same name but with an appended .gz. If this file is not available, nginx instead compresses the content on-the-fly, obviously resulting in a somewhat higher CPU utilization as that work will need to be performed time and again. I have only gzipped some resources while others are compressed on-the-fly. I have never really seen high CPU utilization from nginx on my server, therefore for me at this point, high nginx load due to compression is a luxury problem for which I would probably need to increase the number of visitors by an order of magnitude or two. But when that happens, I may look once more into pre-compression of more of the files. Part of the reason I don't run into issues here is probably because the server is bare metal and has a powerful Xeon CPU. If this was a virtual machine sharing the CPU with other guest VMs, the effect would probably be measurable already, despite the modest number of concurrent users that the server is handling at the moment.

Move webfonts out of above-the-fold content

For the blog, I am using a non-standard web font named Tablet Gothic from an independent type foundry named TypeTogether. I like this font family a lot for many reasons, not least because there is a vast range of styles available (84 altogether). The narrow versions for headlines work really well with the body text. I also think that this font family is really pretty. I don't have to pay extra for the font as it is included in the TypeKit service of my Creative Cloud subscription. However, there is a downside to web fonts when it comes to page render times. At least if you load the font above-the-fold, which I previously did. That will hold up the page rendering until both the TypeKit script and the actual files are loaded. But after thinking about it, I decided that showing the page in Helvetica Neue / SansSerif first is better than not rendering anything at all for a long time. If your connection is fast, you'll hardly notice and if it is not, you will probably still not leave in disgust just because you were subjected to another perfectly fine font for a few seconds. Your mileage may vary, of course, but personally, I don't think I'd use web fonts - unless showing a built-in font first would be okay - because loading the files related to the web fonts alone can take over ten seconds on a slow connection.

What else could be done?

Short answer in my case: nothing really. With these changes in place, PageSpeed Insights now only complains about items that are outside of my sphere of influence:

I could remove the GitHub buttons, the analytics script and the web font altogether just to get an even higher score, but I won't. I am happy with the results and I am not willing to forego any of them. I also find it somewhat odd that one Google tool (PageSpeed Insights) complains about the script of another Google tool (Google Analytics) - as if I could do anything about that! In addition, I think that the complaint about leveraging longer cache times for the GitHub API calls is plain wrong. Those are JSONP calls rather than static content. Arguably, the resource need not be cached at all if we want the result to be accurate.

I also ran YSlow, which seemed pretty happy with the optimizations as well:

Grade A (94/100) sounds much better than the Grade C (78/100) that YSlow previously gave this blog.

Useful links

Here are a handful of articles that I found useful while squeezing the last bit of performance out of this blog. Google has a few great resources available, for example on Optimizing CSS Delivery, HTTP Caching and Optimizing Performance in general. I consider them a must-read if you are serious about delivering a speedy user experience. Also really helpful: the YouTube channels of Ilya Grigorik and Addy Osmani.

BirdWatch in Clojure, ClojureScript and Om

I did a lot of refactoring of the new version of BirdWatch this past week. The application architecture still feels like clay in my hands, but the sculpture is getting into a decent shape. I made the interesting discovery that there really weren't any performance issues introduced by the Clojure rewrite. Rather, the problem was sitting in front of the screen. My initial version triggered a re-render of the word cloud a few orders of magnitude more often than what a reasonable and sane person would have done.

Considering that the word cloud layout is probably the most expensive operation in the entire client-side application, it is no wonder that the application did not respond in the way I would have hoped. It is kind of spectacular that it worked at all...

I also did some preparations for moving the aggregation of previous tweets to the server. More precisely, the client side can now request missing tweets via WebSocket command messages and subsequently render the full tweet once it is back from the server. This isn't terribly useful yet, but it will be at a later stage. Once server-side aggregation is in place, it will no longer be necessary to transmit all thousands of the analyzed tweets to the client. This should either reduce the memory fingerprint by a lot when analyzing the same number of tweets or enable a much higher number of tweets for the same memory utilization. It should also reduce page load times, potentially by a lot.

Here's the current version as a live demo.

My Clojure Resources List

While working on the Clojure application described above, I constantly added fresh links to this list of Clojure resources on GitHub. This week, I added several articles I discovered and found useful and or sometimes outright entertaining. Maybe you'll find enlightening stuff in there as well. Or you may have a link that you believe belongs in there, too. Just let me know or, better yet, submit a pull request with the link and a short comment.

Conclusion

I still want to redesign the blog. But at least the load times have improved tremendously so redesigning it isn't that urgent. Unlike before, the load times even on mobile are such that visitors should only leave because the content of this blog is irrelevant to them, not because the page doesn't load. By the way, back in December 2013 I put a little work into a fast AngularJS-based blog engine. I have not worked on it since, but I thought at least I could open source it. There is no good reason for it to sit in a private repository, after all. I am now curious about some feedback. The cool feature include client-side rendering from markdown, configurable and animated code blocks (see at the bottom of the live demo) and a live preview while authoring. Here's the ng-blog repository on GitHub and here's a live demo. I am just putting this out there to see if anyone is interested. If so, I would probably put more work into it.

Then, coding in Clojure was once again exciting and productive last week, with like 35 commits so far this month. Things are finally settling down, which means that I will soon be able to start a series of articles about this application. In that regard, please let me know if you have any ideas for features that would make the application more useful for you. It is already great that this little toy application of mine has received so much love (if love can be counted in GitHub stars) and I appreciate that a lot, but it would be even more awesome if the application solved an actual problem. I would love to start a conversation (or two or three) here.

Thanks and until next week, Matthias

← Weekly Update: Pomodoro, all-Clojure BirdWatch, Income

Weekly Update: Clojure, Clojure, Clojure and a nasty cold →