Recommend Give Meaning to 100 billion Events a Day - The Analytics Pipeline at Teads (Email)

This action will generate an email recommending this article to the recipient of your choice. Note that your email address and your recipient's email address are not logged by this system.

EmailEmail Article Link

The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.

Article Excerpt:

This is a guest post by Alban Perillat-Merceroz, Software Engineer at Teads.tv.

In this article, we describe how we orchestrate Kafka, Dataflow and BigQuery together to ingest and transform a large stream of events. When adding scale and latency constraints, reconciling and reordering them becomes a challenge, here is how we tackle it.


Teads for Publisher, one of the webapps powered by Analytics

 

In digital advertising, day-to-day operations generate a lot of events we need to track in order to transparently report campaign’s performances. These events come from:

  • Users’ interactions with the ads, sent by the browser. These events are called tracking events and can be standard (start, complete, pause, resume, etc.) or custom events coming from interactive creatives built with Teads Studio. We receive about 10 billion tracking events a day.
  • Events coming from our back-ends, regarding ad auctions’ details for the most part (real-time bidding processes). We generate more than 60 billion of these events daily, before sampling, and should double this number in 2018.

In the article we focus on tracking events as they are on the most critical path of our business.

Simplified overview of our technical context with the two main event sources

 

Tracking events are sent by the browser over HTTP to a dedicated component that, amongst other things, enqueues them in a Kafka topic. Analytics is one of the consumers of these events (more on that below).

We have an Analytics team whose mission is to take care of these events and is defined as follows:

We ingest the growing amount of logs,
We transform them into business-oriented data,
Which we serve efficiently and tailored for each audience.

To fulfill this mission, we build and maintain a set of processing tools and pipelines. Due to the organic growth of the company and new products requirements, we regularly challenge our architecture.

Why we moved to BigQuery


Article Link:
Your Name:
Your Email:
Recipient Email:
Message: