« Sponsored Post: ScaleArc, Spotify, Aerospike, Scalyr, Gusto, VividCortex, MemSQL, InMemory.Net, Zohocorp | Main | Stuff The Internet Says On Scalability For September 9th, 2016 »
Tuesday
Sep132016

The Dollar Shave Club Architecture Unilever Bought for $1 Billion

This is a guest post by Jason Bosco, the Dollar Shave Club’s Director of Engineering, Core Platform & Infrastructure, on the infrastructure of its ecommerce technology.

With more than 3 million members, Dollar Shave Club will do over $200 million in revenue this year. Although most are familiar with the company’s marketing, this immense growth in just a few years since launch is largely due to its team of 45 engineers.

Dollar Shave Club engineering by the numbers:

Core Stats

  • Super Bowl Ads served with no downtime: 1

  • Monthly Traffic Bandwidth: 9 TB

  • Orders processed via Arm: 38 Million orders

  • Total Bugs Found: 4,566

  • Automation Tests Ran: 312,000

  • Emails sent via Voice: 195 Million emails

  • Analytics data points processed and stored in Hippocampus: 534 Million

  • Size of dataset in Hippocampus: 1.5TB

  • Currently Deployed Apps / Services: 22

  • Number of servers: 325

Technology Stack

  • Ember for a front-end framework

  • Primarily Ruby on Rails on the backend

  • Node.js for high-throughput background processing needs (eg: in Voice)

  • Golang for infrastructure software

  • Python for infrastructure & data science

  • Elixir for 1 internal app

  • Ruby for Test Automation

  • Swift and Objective C for Native iOS App

Infrastructure

  • Fully Hosted on AWS

  • Ubuntu & CoreOS

  • Ansible & Terraform for Configuration Management

  • Transitioning to Docker-based deployments

  • Jenkins for deployment coordination

  • Nginx & Varnish

  • Fastly for application delivery

  • Sumologic for log aggregation

  • CloudPassage for security monitoring

  • Vault by HashiCorp for secrets storage & provisioning

Data Stores

  • Primarily MySQL hosted on RDS

  • Memcached hosted on Elasticache for caching

  • Self-hosted Redis servers primarily for queuing

  • A dash of Kinesis for handling orders from spiky traffic

  • Amazon Redshift for a data warehouse

Messaging & Queuing

  • Resque and Sidekiq for async job processing & messaging

  • RabbitMQ for messaging

  • Kafka for stream processing

Analytics & Business Intelligence

  • Snowplow & Adobe Analytics for web/mobile analytics

  • AWS Elastic MapReduce

  • FlyData to ETL data from MySQL into Redshift

  • Databricks (Hosted Spark)

  • Looker as the BI front-end

  • Near-realtime data availability for reporting

Monitoring

  • Rollbar, Sentry & Crashlytics for exception tracking

  • DataDog for custom application metrics & metrics aggregation

  • SysDig for infrastructure metrics & monitoring

  • NewRelic for application performance monitoring

  • Site24x7 for availability monitoring

  • PagerDuty for on-call alerting

QA and Test Automation

  • CircleCI for running unit tests

  • Jenkins + TestUnit + Selenium + SauceLabs for browser-based Automated tests

  • Jenkins + TestUnit + Selenium + SauceLabs for Brain Automated tests

  • Jenkins + TestUnit for API Functional Tests

  • Jenkins + TestUnit + Appium + SauceLabs for Native Android Automated Tests

  • Jenkins + TestUnit + Appium + SauceLabs for Native iOS Automated Tests

  • Jenkins + TestUnit + Selenium + SauceLabs + Proxy Server for BI Test Automation

  • SOASTA + Regex Scripts for Stress, Soak, Load and Performance Testing.

Engineering Workflow

  • Slack for cross-team communication

  • Trello for task tracking

  • Hubot with custom plugins as our chat bot

  • Github as our code repository

  • ReviewNinja integrated with Github Status API for code reviews

  • Continuous deployment - multiple deployments per day typically

  • Moving to continuous delivery

  • On-the-fly sandbox environments for feature development

  • Currently, single-button push deployment using Jenkins, moving towards continuous delivery

  • Vagrant box running docker containers => fully-functioning development environments for new engineers on their first day

Architecture

  • Event-driven architecture

  • Moving from a monolithic architecture to “medium” services interacting through a common message bus

  • VCL-based edge-routing on the CDN edges, deployed just like any other app.

  • Web and Mobile frontends talk to an API layer

  • API layer talks to services, aggregates data and formats it for clients

  • Services talk to the data stores and message bus

  • Scheduled tasks run as one master job that breaks itself up into smaller jobs in resque/sidekiq

  • Technology components include internal tools for customer service (Brain), marketing automation platform (Voice), fulfillment system (Arm), subscription billing system (Baby Boy) and our data infrastructure (Hippocampus).

Team

  • 45 top-notch entrepreneurial and highly-skilled engineers working out of Marina Del Rey, CA HQ

  • Engineers participate in cross-functional teams called squads along with product managers, designers, UX and stakeholders to deliver end-to-end features.

  • Teams are vertically divided based on domains into Frontend, Backend, QA & IT.

  • Front-end team owns Web UI for DSC.com & internal tools and our iOS & Android apps.

  • Backend team owns web backends for DSC.com & internal tools, Internal Services (Billing and Fulfillment), Data Platform & Infrastructure.

  • QA teams owns testing and automation infrastructure for all digital products.

  • IT team owns Office & Warehouse IT.

  • Engineers get to attend one company-sponsored conference every year.

  • Engineers get to buy as many books / learning resources as they need.

  • Standing desks for all. One treadmill desk currently available as a pilot.

  • Weekly engineering team lunches.

  • Tech Belly events every other week where engineers present talks on technology topics over lunch.

  • Engineers are encouraged to experiment with bleeding edge technology and create proposals through Requests for Proposal (RFCs).

  • Engineers are encouraged to open source tools and libraries where it makes sense

  • Every engineer gets a standard issue of a 15” Mac Book Pro, 27” Mac Display and a 24” monitor.

  • One 3D-printer available to print props and more 3D printers.

Lessons Learned

  • Scaling becomes an easier challenge when components you’re trying to scale are composed of simple and small services.

  • Documentation & knowledge sharing are important for fast-growing teams.

  • A well-nurtured test-suite is critical to fast-evolving systems.

  • Redis uses an approximate LRU algorithm, so it’s not suitable if you have precise LRU requirements for caching

  • Web performance is critical, especially on mobile - every millisecond costs us revenue

  • Usability & User Experience are important even for internal tools: efficient tools = more productive teams

On HackerNews

Reader Comments (6)

Why did you decide to host Redis yourselves? I would assume you'd run Redis on ElastiCache (with Replication Groups for better availability). Also, why hosting memcached when it's also available on Elasticache?

September 14, 2016 | Unregistered CommenterHugo Lopes Tavares

Seems like a pretty standard modern stack. Glad it's working out for them! The real question is how their transition into Unilever's corporate technology fortress will look like.

September 14, 2016 | Unregistered CommenterCarlos Nunez

I guess my question is why do you need 45 engineers for what is basically a tiny catalog with a subscription option?

September 22, 2016 | Unregistered CommenterBrad

Documentation makes its first appearance as something important. What infrastructure and process do you use to keep it relevant?

September 25, 2016 | Unregistered CommenterPeter Schaafsma

Why does this company need such a complex system to run? It is not some tech company where 100s and 1000s of requests are coming in realtime. Just seems like an over engineered solution to sound cool. Maybe they are doing something big and complex in the back that I don't see. So I am just curious.

December 8, 2016 | Unregistered CommenterARK

Yeah, this is more of "Life at Dollar Shave Club" than a software architecture breakdown, only one question to the author, why do you need 45 engineers and such a bloated solution? You could easily move your site to Shopify.

October 11, 2020 | Unregistered CommenterMark

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>