Entries in Strategy (358)

Wednesday
Jun292016

Scaling Hotjar's Architecture: 9 Lessons Learned

Hotjar offers free website analytics so they have a challenging mission: handle hundreds of millions of requests per day from mostly free users. Marc von Brockdorff, Co-Founder & Director of Engineering at Hotjar, summarized the lessons they've learned in: 9 Lessons Learned Scaling Hotjar's Tech Architecture To Handle 21,875,000 Requests Per Hour.

In response to the criticism their architecture looks like a hot mess, Erik Näslund, Chief Architect at Hotjar, gives the highlights of their architecture:

  • We use nginx + lua for the really hot code paths where python doesn't quite cut it. No language is perfect and you might have to break out of your comfort zone and use something different every now and then.
  • Redis, Memcached, Postgres, Elasticsearch and S3 are all suitable for different kinds of data storage and we eventually needed them all to be able to query and store data in a cost effective way. We didn't start out using 5 different data-stores though...it's something that we "grew into".
  • Each application server is a (majestic) monolith. Micro-services are one way of architecting things, monoliths are another - I'm still waiting to be convinced that one way is superior to the other when it comes to a smaller team of developers.

What have they learned?

Click to read more ...

Monday
Jun202016

The Technology Behind Apple Photos and the Future of Deep Learning and Privacy

There’s a war between two visions of how the ubiquitous AI assisted future will be rendered: on the cloud or on the device. And as with any great drama it helps the story along if we have two archetypal antagonists. On the cloud side we have Google. On the device side we have Apple. Who will win? Both? Neither? Or do we all win?

If you would have asked me a week ago I would have said the cloud would win. Definitely. If you read an article like Jeff Dean On Large-Scale Deep Learning At Google you can’t help but be amazed at what Google is accomplishing. Impressive. Wide ranging. Smart. Systematic. Dominant.

Apple has been largely absent from the trend of sprinkling deep learning fairy dust on their products. This should not be all that surprising. Apple moves at their own pace. Apple doesn’t reach for early adopters, they release a technology when it’s a win for the mass consumer market.

There’s an idea because Apple is so secretive they might have hidden away vast deep learning chops we don’t even know about yet. We, of course, have no way of knowing.

What may prove more true is that Apple is going about deep learning in a different way: differential privacy + powerful on device processors + offline training with downloadable models + a commitment to really really not knowing anything personal about you + the deep learning equivalent of perfect forward secrecy.

Photos vs Photos

Click to read more ...

Monday
Apr042016

How to Remove Duplicates in a Large Dataset Reducing Memory Requirements by 99%

This is a guest repost by Suresh Kondamudi from CleverTap.

Dealing with large datasets is often daunting. With limited computing resources, particularly memory, it can be challenging to perform even basic tasks like counting distinct elements, membership check, filtering duplicate elements, finding minimum, maximum, top-n elements, or set operations like union, intersection, similarity and so on

Probabilistic Data Structures to the Rescue

Probabilistic data structures can come in pretty handy in these cases, in that they dramatically reduce memory requirements, while still providing acceptable accuracy. Moreover, you get time efficiencies, as lookups (and adds) rely on multiple independent hash functions, which can be parallelized. We use structures like Bloom filtersMinHashCount-min sketchHyperLogLog extensively to solve a variety of problems. One fairly straightforward example is presented below.

The Problem

We at CleverTap manage mobile push notifications for our customers, and one of the things we need to guard against is sending multiple notifications to the same user for the same campaign. Push notifications are routed to individual devices/users based on push notification tokens generated by the mobile platforms. Because of their size (anywhere from 32b to 4kb), it’s non-performant for us to index push tokens or use them as the primary user key.

On certain mobile platforms, when a user uninstalls and subsequently re-installs the same app, we lose our primary user key and create a new user profile for that device. Typically, in that case, the mobile platform will generate a new push notification token for that user on the reinstall. However, that is not always guaranteed. So, in a small number of cases we can end up with multiple user records in our system having the same push notification token.

As a result, to prevent sending multiple notifications to the same user for the same campaign, we need to filter for a relatively small number of duplicate push tokens from a total dataset that runs from hundreds of millions to billions of records. To give you a sense of proportion, the memory required to filter just 100 Million push tokens is 100M * 256 = 25 GB!

The Solution – Bloom filter

Click to read more ...

Monday
Mar212016

To Compress or Not to Compress, that was Uber's Question

Uber faced a challenge. They store a lot of trip data. A trip is represented as a 20K blob of JSON. It doesn't sound like much, but at Uber's growth rate saving several KB per trip across hundreds of millions of trips per year would save a lot of space. Even Uber cares about being efficient with disk space, as long as performance doesn't suffer. 

This highlights a key difference between linear and hypergrowth. Growing linearly means the storage needs would remain manageable.  At hypergrowth Uber calculated when storing raw JSON, 32 TB of storage would last than than 3 years for 1 million trips, less than 1 year for 3 million trips, and less 4 months for 10 million trips.

Uber went about solving their problem in a very measured and methodical fashion: they tested the hell out of it. The goal of all their benchmarking was to find a solution that both yielded a small size and a short time to encode and decode.

The whole experience is described in loving detail in the article: How Uber Engineering Evaluated JSON Encoding and Compression Algorithms to Put the Squeeze on Trip Data. They came up with a matrix of 10 encoding protocols (Thrift, Protocol Buffers, Avro, MessagePack, etc) and 3 compression libaries (Snappy, zlib, Bzip2). The target environment was Python. Uber went to an IDL approach to define and verify their JSON protocol, so they ended up only considering IDL solutions. 

The conclusion: MessagePack with zlib.  Encoding time: 4231 ms. Decoding: 715 ms. There was a 78% reduction in size relative to the JSON zlib combination.

The result: 1 TB disk will now last almost a year (347 days), compared to a month (30 days) without compression. Uber now has enough space to last over 30 years compared to just under 1 year before. That's a huge win for a relatively simple change. Hopefully there's a common library handling all the messaging so this change could be completely transparent to all the developers. Uber also noted that smaller packet sizes mean less data transiting through the system which means less processing time which means less hardware is needed. Another big win.

Something to consider: don't use JSON for messaging. The compression/decompression times are still dog slow. If you are going to use an IDL, which every grown up project eventually moves to for reliability and security reasons, consider not using JSON for messaging. Go for a binary protocol from the start. The performance savings can be dramatic. A lot of the performance drain comes from serialization/deserialization churning through memory and that can be reduced greatly by not using text based protocols like JSON. JSON is convenient, but it's also hugely wasteful at scale. 

Monday
Mar142016

Snuggling Up to Papers We Love - What's Your Favorite Paper?


From a talk by @aysylu22 at QCon London on modern computer science applied to distributed systems in practice.

 

 

There has been a renaissance in the appreciation of computer science papers as a relevant source of wisdom for building today's complex systems. If you're having a problem there's likely some obscure paper written by a researcher twenty years ago that just might help. Which isn't to say there aren't problems with papers, but there's no doubt much of the technology we take for granted today had its start in a research paper. If you want to push the edge it helps to learn from primary research that has helped define the edge.

If you would like to share your love of papers, be proud, you are not alone:

What's Your Favorite Paper? 

If you ask your average person they'll have a favorite movie, book, song, or Marvel Universe character, but it's unlikely they'll have have a favorite paper. If you've made it this far that's probably not you.

My favorite paper of all time is without a doubt SEDA: An Architecture for Well-Conditioned, Scalable Internet Services. After programming real-time distributed systems for a long time I was looking to solve a complex work scheduling problem in a resource constrained embedded system. I stumbled upon this paper and it blew my mind. While I determined that the task scheduling latency of SEDA wouldn't be appropriate for my problem, the paper gave me a whole new way to look out how programs were structured and I used those insights on many later projects.

If you have another source of papers or a favorite paper please feel free to share.

Related Articles

Wednesday
Mar092016

The Simple Leads to the Spectacular

 

Steve Kerr, head coach of the record setting Golden State Warriors (my local Bay Area NBA basketball team), has this to say about what the team needs to do to get back on track (paraphrased):

What we have to get back to is simple, simple, simple. That's good enough. The simple leads to the spectacular. You can't try the spectacular without doing the simple first. Make the simple pass. Our guys are trying to make the spectacular plays when we just have to make the easy ones. If we don't get that cleaned up we're in big trouble. 

If you play the software game, doesn't this resonate somewhere deep down in your git repository?

If you don't like basketball or despise sports metaphors this is a good place to stop reading. The idea that "The simple leads to the spectacular" is probably the best TLDR of Keep it Simple Stupid I've ever heard.

Software development is fundamentally a team sport. It usually takes a while for this lesson to pound itself into the typical lone wolf developer brain. After experiencing a stack of failed projects I know it took an embarrassingly long time for me to notice this pattern. It's one of those truths that gradually reveals itself over time...

Click to read more ...

Tuesday
Mar082016

Performance Tuning Apache Storm at Keen IO


Hi, I'm Manu Mahajan and I'm a software engineer with Keen IO's Platform team. Over the past year I've focused on improving our query performance and scalability. I wanted to share some things we've learned from this experience in a series of posts.

Today, I'll describe how we're working to guarantee consistent performance in a multi-tenant environment built on top of Apache Storm.

tl;dr we were able to make query response times significantly more consistent and improve high percentile query-duration by 6x by making incremental changes that included isolating heterogenous workloads, making I/O operations asynchronous, and using Storm’s queueing more efficiently.

High Query Performance Variability

Keen IO is an analytics API that allows customers to track and send event data to us and then query it in interesting ways. We have thousands of customers with varying data volumes that can range from a handful of events a day to upwards of 500 million events per day. We also support different analysis types like counts, percentiles, select-uniques, funnels, and more, some of which are more expensive to compute than others. All of this leads to a spectrum of query response times ranging from a few milliseconds to a few minutes.

The software stack that processes these queries is built on top of Apache Storm (and Cassandra and many other layers). Queries run on a shared storm cluster and share CPU, memory, IO, and network resources. An expensive query can easily consume physical resources and slow down simpler queries that would otherwise be quick.

If a simple query takes a long time to execute it creates a really bad experience for our customers. Many of them use our service to power real-time dashboards for their teams and customers, and nobody likes to wait for a page while it's loading.

Measuring Query Performance Variability

Click to read more ...

Wednesday
Mar022016

Malice or Stupidity or Inattention? Using Code Reviews to Find Backdoors

The temptation to put a backdoor into a product is almost overwhelming. It’s just so dang convenient. You can go into any office, any lab, any customer site and get your work done. No hassles with getting passwords or clearances. You can just solve problems. You can log into any machine and look at logs, probe the box, issue commands, and debug any problem. This is very attractive to programmers.

I’ve been involved in several command line interfaces to embedded products and though the temptation to put in a backdoor has been great, I never did it, but I understand those who have.

There’s another source of backdoors: infiltration by an attacker.

We’ve seen a number of backdoors hidden in code bases you would not expect. Juniper Networks found two backdoors in its firewalls. Here’s Some Analysis of the Backdoored Backdoor. Here’s more information to reaffirm your lack of faith in humanity: NSA Helped British Spies Find Security Holes In Juniper Firewalls. And here are a A Few Thoughts on Cryptographic Engineering.

Juniper is not alone. Here’s a backdoor in AMX AV equipment. A Secret SSH backdoor in Fortinet hardware found in more products. There were Backdoors Found in Barracuda Networks Gear. And The 12 biggest, baddest, boldest software backdoors of all time. Who knows how many backdoors are embedded in chips? Security backdoor found in China-made US military chip. And so on.

By now we can pretty much assume backdoors are the rule, not the exception.

Backdoors are a Cheap form of Attack

Click to read more ...

Monday
Jan182016

Use Google For Throughput, Amazon And Azure For Low Latency

Which cloud should you use? It may depend on what you need to do with it. What Zach Bjornson needs to do is process large amounts scientific data as fast as possible, which means reading data into memory as fast as possible. So, he made benchmark using Google's new multi-cloud PerfKitBenchmarker, to figure out which cloud was best for the job.

The results are in a very detailed article: AWS S3 vs Google Cloud vs Azure: Cloud Storage Performance. Feel free to datamine the results for more insights, but overall his conclusions are:

Click to read more ...

Monday
Jan042016

Server-Side Architecture. Front-End Servers and Client-Side Random Load Balancing

Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here's a recent chapter from his book.

Enter Front-End Servers

[Enter Juliet]
Hamlet:
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
[Exit Juliet]

— a sample program in Shakespeare Programming Language

 

 

Front-End Servers as an Offensive Line

 

Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:

Click to read more ...