Entries in Strategy (358)

Wednesday
Nov262014

Make Any Framework Suck Less With These 10 Insightful Lessons

Alexey Migutsky in 2 years with Angular has a lot to say about Angular, which I can't comment on at all, not being an Angular user. But burried in his article are some lessons for building better frameworks that obviously come from deep experience. Frameworks will always suck, but if you follow these lessons will your frameworks suck less? Yes, I think they will.

Here are Alexey's Lessons for framework (and metaframework) developers:

  1. You should have as small as possible number on abstractions.
  2. You should name things consistent with your "thought domain".
  3. Do not mix several responsibilities in your components. Make fine-grained abstractions with well-defined roles.
  4. Always describe the intention for your decisions and tradeoffs in your documentation.
  5. Have a currated and updated reference project/examples.
  6. You abstractions should scale "from bottom up". Start with small items and then fit them to a Composite pattern. Do not start with the question "How do we override it globally?".
  7. Global state is pure evil. It's like darkness in the horror films - you never know what problems you will have when you tread into it...
  8. The dataflow and data changes should be granular and localized to a single component.
  9. Do not make things easy to use, make your components and abstractions simple to understand. People should learn how to do stuff in a new and effective way, do not ADAPT to their comfort zone.
  10. Do not encode all good things you know in the framework.
Monday
Nov242014

A Flock of Tasty Sources on How to Start Learning High Scalability

This is a guest repost by Leandro Moreira.

distributed systems

When we usually are interested about scalability we look for links, explanations, books, and references. This mini article links to the references I think might help you in this journey.

DISCLAIMER:

You don’t need to have N machines to build/test a cluster/high scalable system, currently you can use Vagrant and up N machines easily.

THE REFERENCES:

Now that you know you can empower yourself with virtual servers, I challenge you to not only read these links but put them into practice.

Good questions to test your knowledge:

Click to read more ...

Wednesday
Nov192014

We are leaving 3x-4x performance on the table just because of configuration.

Performance guru Martin Thompson gave a great talk at Strangeloop: Aeron: Open-source high-performance messaging, and one of the many interesting points he made was how much performance is being lost because were aren't configuring machines properly.

This point comes on the observation that "Loss, throughput, and buffer size are all strongly related."

Here's a gloss of Martin's reasoning. It's a problem that keeps happening and people aren't aware that it's happening because most people are not aware of how to tune network parameters in the OS.

The separation of programmers and system admins has become an anti-pattern. Developers don’t talk to the people who have root access on machines who don’t talk to the people that have network access. Which means machines are never configured right, which leads to a lot of loss. We are leaving 3x-4x performance on the table just because of configuration.

We need to workout how to bridge that gap, know what the parameters are, and how to fix them.

So know your OS network parameters and how to tune them.

Related Articles

Monday
Nov032014

Improve small job completion times by 47% by running full clones.

The idea is most jobs are small. Researchers found 82% of jobs on Facebook's cluster were less than 10 tasks. Clusters have a median utilization of under 20%. And since small jobs are particularly sensitive to stragglers the audacious solution is to proactively launch clones of a job as they are submitted and pick the result from the earliest clone. The result is an average completion time of all the small jobs improved by 47% using cloning, at the cost of just 3% extra resources.

For more details take a look at the very interesting Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly.

Related Articles

Monday
Oct272014

Microservices in Production - the Good, the Bad, the it Works

This is a guest repost written by Andrew Harmel-Law on his real world experiences with Microservices. The original article can be found here.

It’s reached the point where it’s even a cliche to state “there’s a lot written about Microservices these days.” But despite this, here’s another post on the topic. Why does the internet need another? Please bear with me…

We’re doing Microservices. We’re doing it based on a mash-up of some “Netflix Cloud” (as it seems to becoming known - we just call it “Archaius / Hystrix”), a gloop of Codahale Metrics, a splash of Spring Boot, and a lot of Camel, gluing everything together. We’ve even found time to make a bit of Open Source ourselves - archaius-spring-adapter - and also contribute some stuff back.

Lets be clear; when I say we’re “doing Microservices”, I mean we’ve got some running; today; under load; in our Production environment. And they’re running nicely. We’ve also got a lot more coming down the dev-pipe.

All the time we’ve been crafting these we’ve been doing our homework. We’ve followed the great debate, some contributions of which came from within Capgemini itself, and other less-high-profile contributions from our very own manager. It’s been clear for a while that, while there is a lot of heat and light generated in this debate, there is also a lot of valid inputs that we should be bearing in mind.

Despite this, the Microservices architectural style is still definitely in the honeymoon period, which translates personally into the following: whenever I see a new post on the topic from a Developer I respect my heart sinks a little as I open it and read… Have they discovered the fatal flaw in all of this that everyone else has so far missed? Have they put their finger on the unique aspect that mean 99% of us will never realise the benefits of this new approach and that we’re all off on a wild goose chase? Have they proven that Netflix really are unicorns and that the rest of us are just dreaming?

Despite all this we’re persisting. Despite always questioning every decision we make in this area far more than we normally would, Microservices still feel right to us for a whole host of reasons. In the rest of this post I hope I’ll be able to point out some of the subtleties which might have eluded you as you’ve researched and fiddled, and also, I’ve aimed to highlight some of the old “givens” which might not be “givens” any more.

The Good

Click to read more ...

Monday
Oct202014

Facebook Mobile Drops Pull For Push-based Snapshot + Delta Model

We've learned mobile is different. In If You're Programming A Cell Phone Like A Server You're Doing It Wrong we learned programming for a mobile platform is its own specialty. In How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks we learned bandwidth on mobile networks is a precious resource. 

Given all that, how do you design a protocol to sync state (think messages, comments, etc.) between mobile nodes and the global state holding servers located in a datacenter?

Facebook recently wrote about their new solution to this problem in Building Mobile-First Infrastructure for Messenger. They were able to reduce bandwidth usage by 40% and reduced by 20% the terror of hitting send on a phone.

That's a big win...that came from a protocol change.

Facebook Messanger went from a traditional notification triggered full state pull:

Click to read more ...

Wednesday
Oct152014

Using a SSD Cache in Front of EBS Boosted Throughput by 50%, for Free

Using EBS has lots of advantages--reliability, snapshotting, resizing--but overcoming the performance problems by using Provisioned IOPS is expensive. 

Swrve, an integrated marketing and A/B testing and optimization platform for mobile apps, did something clever. They are using the c3.xlarge EC2 instances, that have two 40GB SSD devices per instance, as a cache.

They found through testing RAID-0 striping using a 4-way stripe along with enhanceio, effectively increased throughput by over 50%, for free. With no filesystem corruption problems.

How is it free? "We were planning on upgrading to the C3 class of instance anyway, and sticking with EBS as the backing store. Once you’re using an instance which has SSD ephemeral storage, there are no additional fees to use that hardware."

For great analysis, lots of juicy details, graphs, and configuration commands, please take a look at How we increased our EC2 event throughput by 50%, for free

Wednesday
Oct082014

That's Not My Problem - I'm Renting Them

Scott Hanselman gives a hilarious and insightful talk in Virtual Machines, JavaScript and Assembler, a keynote at Velocity Santa Clara 2014. The topic of his talk is an intuitive understanding of the cloud and why it's the best thing ever. 

At about 6:30 into the video Scott is at his standup comic best when he recounts a story of a talk Adrian Cockroft gave on Netflix’s move to SSDs. An audience member energetically questioned the move to SSDs saying they had high failure rates and how moving to SSDs was a stupid idea.

To which Mr. Cockroft replies:

That's not my problem, I'm renting them.

Scott selected the ideal illustration of the high level of abstraction the cloud provides. If you are new to the cloud that's a very hard idea to grasp. "That's not my problem, I'm renting them" is the perfect mantra when you find yourself worried about things you don't need to be worried about anymore.

Wednesday
Sep242014

5 Tips for Scaling NoSQL Databases: Don’t Trust Assumptions—Test, Test, Test!

Alex Bordei, product manager for Bigstep’s Full Metal Cloud, in Scaling NoSQL databases: 5 tips for increasing performance, shares a nice set of lessons he's learned about how NoSQL databases scale:

  • Never assume linearity in scaling. Hardware prices grow exponentially as the specs increase, but not all software can take full advantage of all that power. So you may be paying for hardware your database can't use. Find the sweet spot for price and hardware capabilities.
  • Tests speak louder than specs. Don't trust vendor documentation. It's cheap to spin up new instances so test the specs for yourself.
  • Mind the details: Memory & CPU numbers matter. For in-memory databases the specs on your memory modules matter. Faster memory means faster performance. Same for CPU frequencies. Pay attention to what your money is buying.
  • Do not neglect network latency. Paying for fast memory and fast CPU won't do a lot of good if your network is slow. 
  • Avoid virtualization with NoSQL databases. Virtualization can exact a 20-200% performance penalty. Noisy neighbors also help ruin the neighborhood. Up to 400% performance gains can be seen by switching away from virtualization and adopting bare metal clouds.

Lots of good advice. Each of these points in discussed in more detail in the original article, which is well worth reading.

 

Monday
Sep222014

How Facebook Makes Mobile Work at Scale for All Phones, on All Screens, on All Networks

Update: Instagram Improved Their App's Performance. Here's How.

When you find your mobile application that ran fine in the US is slow in other countries, how do you fix it? That’s a problem Facebook talks about in a couple of enlightening videos from the @scale conference. Since mobile is eating the world, this is the sort of thing you need to consider with your own apps.

In the US we may complain about our mobile networks, but that’s more #firstworldproblems talk than reality. Mobile networks in other countries can be much slower and cost a lot more. This is the conclusion from Chris Marra, Project Manager at Facebook, in a really interesting talk titled Developing Android Apps for Emerging Market.

Facebook found in the US there’s 70.6% 3G penetration with 280ms average latency. In India there’s 6.9% 3G penetration with 500ms latency. In Brazil there’s 38.6% 3G penetration with more than 850ms average latency.

Chris also talked about Facebook’s comprehensive research on who uses Facebook and what kind of phones they use. In summary they found not everyone is on a fast phone, not everyone has a large screen, and not everyone is on a fast network.

It turns out the typical phone used by Facebook users is from circa 2011, dual core, with less than 1GB of RAM. By designing for a high end phone Facebook found all their low end users, which is the typical user, had poor user experiences.

For the slow phone problem Facebook created a separate application that used lighter weight animations and other strategies to work on lower end phones. For the small screen problem Facebook designers made sure applications were functional at different screen sizes.

Facebook has moved to a product organization. A single vertical group is responsible for producing a particular product rather than having, for example, an Android team try to create all Android products. There’s also a horizontally focussed Android team trying to figure out best practices for Android, delving deep into the details of what makes a platform tick.

Each team is responsible for the end-to-end performance and reliability for their product. There are also core teams looking at and analyzing general performance problems and helping where needed to improve performance.

Both core teams and product teams are needed. The core team is really good at instrumentation and identifying problems and working with product teams to fix them. For mobile it’s important that each team owns their full product end-to-end. Owning core engagement metrics, core reliability, and core performance metrics including daily usage, cold start times, and reliability, while also knowing how to fix problems. 

To solve the slow network problem there’s a whole other talk. This time the talk is given by Andrew Rogers, Engineering Manager at Facebook, and it’s titled Tuning Facebook for Constrained Networks. Andrew talks about three methods to help deal with network problems: Image Download Sizes, Network Quality Detection, Prefetching Content.

Overall, please note the immense effort that is required to operate at Facebook scale. Not only do you have different phones like Android and iOS, you have different segments within each type of phone you must code and design for. This is crazy hard to do.

Reducing Image Sizes -  WebP saved over 30% JPEG, 80% over PNG

Click to read more ...

Page 1 ... 5 6 7 8 9 ... 36 Next 10 Entries »