Building Highly Scalable V6 Only Cloud Hosting
Wednesday, July 20, 2016 at 8:56AM
HighScalability Team in ipv6

This is a guest repost by Donatas Abraitis, Lead Systems Engineer at at Hostinger International.

This article is about how we built the new high scalable cloud hosting solution using IPv6-only communication between commodity servers, what problems we faced with IPv6 protocol and how we tackled them for handling more than ten millions active users.

Why did we decide to run IPv6-only network?

At Hostinger we care much about innovation technologies, thus we decided to run a new project named Awex that is based on this protocol. If we can, so why not start since today? Only frontend (user facing) services are running in dual-stack environment, everything else is IPv6-only for west-east traffic.

Architecture

We are using pods. Pod is a cluster which shares the same VIPs (Virtual IPs) addresses as anycast and can handle HTTP/HTTPS requests in parallel. Hunderds nodes per pod can handle user’s request simultaneously without saturating the single one. Parallelization is done using BGP and ECMP with resilient hashing to avoid traffic scattering. Hence every edge node is running BGP daemon for announcing VIPs toToR switch. As BGP daemon we are running ExaBGP and using single IPv6 session for announcing both protocols (IPv4/IPv6). BGP session is configured automatically during server bootstrap step. Announcements are different depending on server’s role, including /64 prefix per node plus many of VIPs for north-south traffic. /64 prefix is specially delegated for containers. Every edge node runs plenty of containers and they communicate each other between other nodes and internal services.

Every edge node uses Redis as slave replica to get upstream for particular application, hence every upstream has thousands of containers (IPv6) as list spanning between nodes in pod. These huge lists are generated in real-time using consul-template. Edge node has many public IPv4 (512) and global IPv6 (512) addresses. Wondering why? To tackle with DDoS attacks. We use DNS to randomize A/AAAA for client’s response. Client points his domain to our CNAME record named route, which in turn is randomized by our custom service named Razor. We will talk about Razor on the further posts.

Network gear

At first, for ToR switches we decided to use OpenSwitch, which is quite young but interesting and promising community project. We tested this OS in our lab for few months, even contributed some changes to OpenSwitch, like this patch. Issued a number of bugs, most of them were finally fixed, but not as fast as needed, hence we postponed experimenting with OpenSwitch for a while and gave Cumulus a try. By the way, we are still testing OpenSwitch in our lab because we are planning to use it in the near future.

Cumulus allows us to have fully automated network, where we reconfigure network including BGP neighbors, upstreams, firewall, bridges, etc. on changes. For instance, we add a new node, then Ansible will automatically see changes in Chef inventory by looking at LLDP attributes and regenerate network configuration for particular switch. If we want to add a new BGP upstream or firewall rule, we just create pull request to our Github repo and everything is done automatically including checking syntax and deploying changes in production. Every node is connected with single 10GE interface using Clos topology. Here are a few examples of pull requests:

Add IPv6 for internal ceph bridgeRemove IPv4 network for Ceph

Problems we tackled during the process

Lessons learned

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.