Entries by HighScalability Team (1576)

Friday
Jun102011

Stuff The Internet Says On Scalability For June 10, 2011

Submitted for your scaling pleasure: 

  • Achievements:
  • Quotes of a quotable essence:
    • robinduckett: FACT: You are not a web developer if you need third party services which handle scalability so you can "focus on the programming".
    • Twitter’s Bain: Facebook May Have More Scale, We Have More Engagement
    • shervin: Fallibility without malleability sheds scalability.
    • uisdans: Fat client/server is over. We're moving from #apps #social web #iaas to a #nui #richapp #bigdata #paas spanning the private/public cloud
  • Ex-Google Engineer Says the Company's Software Infrastructure is Obsolete. Arguments don't follow IMHO. Creating a global infrastructure in the large is a very different goal that following the latest trends for personal projects. Though it is no doubt limiting to have to use this infrastructure for everything.
  • Datacenters are becoming their own technological niche. Why use Internet tech in datacenter? TCP becomes Data Center TCP (DCTCP)an enhancement to the TCP congestion control algorithm for data center networks.
To see much more of Stuff the Internet says, please read more below...

Click to read more ...

Wednesday
Jun082011

Stuff to Watch from Google IO 2011

With the Google IO Developer Conference completed there are dozens and dozens of information packed videos now available. While you won't get any of the nifty free swag the attendees rake in, it's great of Google to make these videos available so quickly after the conference. 

Let's say you don't want to watch all the videos on the pretense you have a life, here are just a dozen scalability and architecture related videos you might find interesting:

Click to read more ...

Monday
Jun062011

Apple iCloud: Syncing and Distributed Storage Over Streaming and Centralized Storage

There has been a lot of speculation over how Apple's iCloud would work. With the Apple Worldwide Developers Conference keynotes having just completed, we finally learned the truth. We can handle it. They made some interesting and cost effective architecture choices that preserved the value of their devices and the basic model of how their existing services work.

A lot of pundits foretold that with all the datacenters Apple was building we would get a streaming music solution. Only one copy of music would be stored and then streamed on demand to everyone. Or they could go the Google brute force method and copy up all a user's music and play it on demand.

Apple did neither. The chose an interesting middle path that's not Google, Amazon, or even MobileMe.

They key idea is you no longer need a PC. Device content is now synced over the air and is managed by the cloud, not your legacy computer. Your data may not even be stored in the cloud, but the whole management, syncing, and control of content is done by the cloud instead of the PC. PCs are now just another device on par with the iPhone and iPad. 

What happens to your data depends on the type of data. Apple gives you 5GB of free storage, which doesn't sound like a lot at all. The twist here is purchased music, apps, and books, and photos will not count against  free storage because these are stored on your devices. Photos hit the cloud for a maximum of 30 days, which allows your devices 30 days to contact the cloud and download the photos, after that I guess they are lost. All the big data is stored on your devices.

Some smaller content is stored in the cloud. This content is mail, documents, Camera Roll, account information, settings, and other app data. This data is much smaller than photos, videos, and music, so it's a manageable amount of storage per user. It wasn't talked about, but I'd imagine storage could be increased for a price, so any increased storage usage would be funded.

What Apple ended up creating is a syncing model where large content is synced between devices, smaller meta-data type content is stored in the cloud, and shared changeable data like mail is stored in the cloud. The advantages of this approach are: 

Click to read more ...

Friday
Jun032011

Stuff The Internet Says On Scalability For June 3, 2011

Submitted for your scaling pleasure: 

  • Twitter indexes an average of 2,200 TPS (peek is 4x that) while serving 18,000 QPS (1.6B queries per day). eBay serves 2 billion page views every day requiring more than 75 billion database requests.
  • Quotable Quotes:
    • Infrastructure is adaptation --Kenneth Wright, referencing reservoir building by the Anasazi
    • MattTGrant: You say: "Infinite scalability" - I say: "fractal infrastructure"
  • Like the rich, More is different, says Zillionics. Large quantities of something can transform the nature of those somethings. Zillionics is a new realm, and our new home. The scale of so many moving parts require new tools, new mathematics, new mind shifts.  Amen.
  • Data mine yourself says the Quantified Self. All that jazz about monitoring and measuring services to continually improve them-- that works for you too! You may not be a number, but self-numbers are a path towards being all you can be. Motivated by this same spirit, some time ago I published an empirical process control method for weight control centered on creating and using a feed back system. More at: The 10 Designer Principles for Controlling Your Weight and the The Designer Way
To see much more of Stuff the Internet says, please read more below...

Click to read more ...

Wednesday
Jun012011

Why is your network so slow? Your switch should tell you.

Who hasn't cursed their network for being slow while waiting for that annoying little hour glass of pain to release all its grains of sand? But what's really going on? Is your network really slow? PacketPushers Show 45 – Arista – EOS Network Software Architecture has a good explanation of what may be really at fault (paraphrased):

Click to read more ...

Tuesday
May312011

Awesome List of Advanced Distributed Systems Papers

As part of Dr. Indranil Gupta's CS 525 Spring 2011 Advanced Distributed Systems class, he has collected an incredible list of resources on distributed systems. His research group is also doing some interesting work.

The various topics include: Before there Were Clouds, Cloud Computing, P2P Systems, Basic Distributed Computing Concepts, Sensor Networks, Overlays and DHTs, Cloud Programming, Cloud Scheduling, Key-Value Stores, Storage, Sensor Net Routing, Geo-Distribution, P2P Apps, In-network processing, Epidemics, Probabilistic Membership Protocols, Distributed Monitoring and  Management, Publish-Subscribe/CDNs, Measurement Studies, Old Wine: Stale or Vintage?, In Byzantium, Cloud Pricing, Other Industrial Systems, Structure of Networks, Completing the Circle, Green Clouds, Distributed Debugging, Flash!, The Middle or the End?, Availability-Aware Systems, Design Methodologies, Handling Stress, Sources of unreliability in networks, Handling Stress, Selfish algorithms, Security, Economic Theory, The future of sensor nets?, The End-to-End Approach, Automatic Computing and Inference, Caching, Classical Algorithms, Topology and Naming, Practical theory perspectives, Modular Systems.

That's just the list of topics! For every topic there's the slide deck used to teach the class, a main list of papers and a second list of optional papers. So there's a lot to choose from. Happy reading! If any of the papers really stand out for you, please share.

Click to read more ...

Tuesday
May312011

Sponsored Post: Animoto, deviantART, Hadapt, Clustrix, Percona, Mathworks, AppDynamics, ScaleOut, Cloudkick, Membase, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • Animoto is building a Systems (DevOps) Team. Please apply here
  • Clustrix Inc. - Learn more about Clustrix's interpretation of NewSQL. Please apply here.
  • deviantART is looking for Network and Systems Operations Engineer. Please apply here.
  • Hadapt brings high-performance SQL to Hadoop, and is looking for a systems engineer to join this fast-growing company. Please apply at http://www.hadapt.com/jobs.
  • MathWorks Looking for Multiple, Full-time Scaling Experts. Apply now: http://matlab.my/lVmunb 

Fun and Informative Events

  • Percona is running an intensive one-day MySQL conference in New York City on May 26th.  High Scalability readers save $50 with the code PLNY-HiSc. Learn more and register at percona.com/live/.
  • CouchDB Developer Training coming to Washington, D.C., Portland, San Francisco and Chicago! Membase Server Ops Training coming to New York City and San Francisco!

Cool Products and Services

For a longer description of each sponsor, please read more below...

Click to read more ...

Friday
May272011

Stuff The Internet Says On Scalability For May 27, 2011

Submitted for your scaling pleasure: 

Read much more of Stuff the Internet Says by clicking the down below...

Click to read more ...

Wednesday
May252011

Stuff to Watch from Surge 2010

Surge is a conference put on by OmniTI targeting practical Scalability matters. OmniTI specializes in helping people solve their scalability problems, as is only natural, as it was founded by Theo Schlossnagle, author of the canonical Scalable Internet Architectures

Now that Surge 2011 is on the horizon, they've generously made available nearly all the videos from the Surge 2010 conference.  A pattern hopefully every conference will follow (only don't wait a year please). We lose a lot of collective wisdom from events not being available online in a timely manner.

In truth, nearly all the talks are on topic and are worth watching, but here are a few that seem especially relevant:

Click to read more ...

Monday
May232011

Evernote Architecture - 9 Million Users and 150 Million Requests a Day

The folks at Evernote were kind enough to write up an overview of their architecture in a post titled Architectural Digest. Dave Engberg describes their approach to networking, sharding, user storage, search, and some other custom services.

Evernote is a cool application, partially realizing Vannevar Bush's amazing vision of a memex. Wikipedia describes Evernote's features succinctly: 

Evernote is a suite of software and services designed for notetaking and archiving. A "note" can be a piece of formattable text, a full webpage or webpage excerpt, a photograph, a voice memo, or a handwritten "ink" note. Notes can also have file attachments. Notes can then be sorted into folders, tagged, annotated, edited, given comments, and searched. Evernote supports a number of operating system platforms (including Android, Mac OS X, iOS, Microsoft Windows and WebOS), and also offers online synchronization and backup services.

Key here is that Evernote stores a lot of data, that must be searched, and synced through their cloud to any device you use. 

Another key is the effect of Evernote's business model and cost structure. Evernote is notable for their pioneering of the freemium model, based on the idea from their CEO: The easiest way to get 1 million people paying is to get 1 billion people using. Evernote is designed to become profitable at a 1% conversion rate. The free online service limits users to a hefty 60 MB/month while premium users pay $45 per year for 1,000 MB/month. To be profitable they most store a lot of data without spending a lot of money. There's not a lot of room for extras, which accounts for the simple practicality of their architecture. 

The article is short and succinct, so definitely read it for details. Some takeaways:  

Click to read more ...