Tracking events in a web page is something very ordinary now-a-days. There’s rarely a page that won’t drop that sneaky pixel every time you buy a pair of headphones or click on that glowing call to action that will change your life. And why wouldn’t you do it? It helps you understand your visitors, how they react to your content, and why your page might not be as effective as you expected… it helps you improve your visitor’s experience. As the traffic in your website grows – or websites if you have a third party library – you might want to track all these relevant events without deteriorating the user’s experience.

A very standard practice is to send a request to the server and log it, disregarding any confirmation of whether the tracking was successful, or at least not requiring it to be successfully logged for the journey to continue, which sounds sane and should have very satisfactory results. Once your traffic increases to thousands of requests per minute, you might notice a substantial amount of events being lost. Assuming that 100% of accuracy might not be feasible due to the nature of client-side-only tracking, I’ll touch a few points that should help seeing improvements in the accuracy of your tracking… small changes can have some amazingly positive effects in the end result.

This post is focused on third-party libraries which depend mostly/solely on client-side operations to properly function, and traffic grows exponentially as more websites adopt them, although these techniques can be adopted by any client-side tracking. I’ll start with “tricks” that can be applied to most trackers and become more specific as we go.

Before starting… Tracking Servers (or Events Collector)

Before getting into the “client-side” improvements, I’d like to start with a small recommendation for the server-side which might be quite obvious. Regardless of whether you want to process the data offline or in realtime, I recommend having a very simple, stateless collector which can be horizontally scalable. You can choose to simply get the request and write into a disk log (then rotate it and process), or just throw it as-is (with minimum or none validation) into a stream (e.g. Kinesis, or Kafka), or even a combination of both.
Keeping a thin layer for events collection will help you scale and keep a fast response time as your traffic grows.

1. Pixels vs Ajax

It is very common to fall into the temptation of using Ajax for tracking your events. You can use POST and send all the data you want to track in a nicely formatted JSON within the body of the message. After all, that’s one of the functions of Ajax, to send and receive data from the server, it also looks well organized and it’s very clear what you are aiming to do.
The fun fact is that, as your traffic grows, a big amount of events that you are losing could be saved by simply switching to a tracking pixel.
Basically, a tracking pixel (widely used by different analytics platforms and affiliate networks) is a 1×1 blank image (gif for example) which includes all the tracking data within the query string.
So, how does using a pixel would improve tracking performance?

This is how a cross-domain POST request would look like:

  1. DNS lookup
  2. Establish connection with server
  3. Send OPTIONS request
  4.  If server confirms CORS headers, send POST headers and body

So, if the connection gets interrupted at any of these stages – and it will – then your data will never arrive to your events collector, therefore the data will be lost. Now that’s the nature of browser-side events tracking, the browser can be closed half way through or the internet connection might suddenly vanish. The server might receive the headers, but the body of the message never arrives (that would cause your server or load balancer to hang waiting for the message body until it times out). There’s not too much we can do about those issues, but we can reduce the number of stages and decrease the chances of losing the connection before you collect the data.

And that’s exactly what using the a tracking pixel will do… this is how it’d flow:

  1. DNS lookup
  2. Establish connection with server
  3. Request pixel (message included in the headers)

We are basically eliminating one round trip to the servers and since the message is sent in the headers, if your request reaches your servers, then the whole message will be there (no more waiting for a message body that never shows up).

Implementation Example

Images can be loaded using javascript without the need of actually appending the image to the website:

function trackEvent(data) {
  var endpoint = "//beacon.analytics.com/i.gif";
  var encoded = btoa(JSON.stringify(data)); // Base64 encoded
  var trackData = endpoint + "?data=" + encoded;
  var pixel = new Image();
  pixel.src = trackData; // never append the image
}

trackEvent({ event: 'pageView' });

Tracking pixels work for basic tracking even without javascript activated:

<img src="//beacon.analytics.com/i.gif?data=eyBldmVudDogInBhZ2VWaWV3IiB9" />; <!-- encoded { event: 'pageView' } -->;

In the not so bright side, pixels won’t let you send a nice response back. But honestly, I believe most high performance trackers just want to log the event and respond as fast as possible.

2. DNS Caching

This trick might not be important for a lot of implementations, if you’re just tracking an event when the page loads, then there’s not much to be done here and you’d be better without this. But if your event might happen at any time in the user journey (e.g. track when the visitor clicks a CTA button), then caching the DNS lookup can reduce your latency very nicely.

So, how does this work? It is a very simple trick, whenever you load the page, you can just request a pixel without any data to your collector, this will force the browser to do a DNS lookup and it should be cached afterwards, so when the real event (that you care about and don’t want to lose) happens, it will skip that step:

  1. Establish connection with server
  2. Request pixel (message included in headers)

Now your tracking is reduced to two steps and your failure surface is much smaller. On the not-so-bright side, in the previous improvement you reduced the load in your servers by half by eliminating the OPTIONS step, and now you might be increasing by much more than that. Although the gains should be worth this extra load and, since you don’t care about the first request’s data, you can just delegate the serving of the pixel to your load balancer (nginx deals with static files very well).

Implementation Example

<body>
  <!-- Cache DNS lookup result -->
  <img src="//beacon.analytics.com/ping.gif">
  <a id="signin" href="#signin">
  <script>
    $("#signin").click(function() {
        trackEvent({  // DNS lookup should be cached
          event: 'click',
          target: 'signin'
        });
    });
  </script>
</body>

3. Open Connection (Keep-alive)

The final tweak that you can do to your tracking in order to reduce the number of tracking stages when the events happen is to keep the connection with the server open. You’d just have to change the ping request to send an extra header Keep-Alive: timeout=30. You can then either play around with this header until you get what you want or just reconnect every 30 seconds, for example.

Stages using all the improvements:

  1. Request pixel (message included in headers)

With this last change, we manage to reduce the number of stages down to only one at the cost of using more resources on the backend.

Implementation Example

function establishConnection(timeout) {
  var ping = new Image();
  ping.src = "//beacon.analytics.com/i.gif";
  if (typeof timeout !== 'undefined')
    setTimeout(establishConnection.bind(this, timeout), timeout);
}

establishConnection(30000);

$("#signin").click(function() {
    trackEvent({ // Connection should be established
      event: 'click',
      target: 'signin'
    });
});

Summary

Small tweaks like moving into a tracking pixel, caching the DNS lookup, and keeping the connection open, although might look like an overkill, can make huge difference in the amount of data you end up tracking as you grow into millions of events a day (maybe not as noticeable with small-scale tracking). Even just using one of those tricks should affect your results in a noticeable way.
There are other ways of tracking and improving the accuracy – even using ajax – but this post should summarize why tracking pixels are so popular and why you might want to consider using it.

@AlexCorreia

Advertisements