Entries in AskHighScalability (8)

Monday
Aug242015

Ask HighScalability: Choose an Async App Server or Multiple Blocking Servers?

Jonathan Willis, software developer by day and superhero by night, asked an interesting question via Twitter on StackOverflow

tl;dr Many Rails apps or one Vertx/Play! app?


I've been having discussions with other members of my team on the pros and cons of using an async app server such as the Play! Framework (built on Netty) versus spinning up multiple instances of a Rails app server. I know that Netty is asynchronous/non-blocking, meaning during a database query, network request, or something similar an async call will allow the event loop thread to switch from the blocked request to another request ready to be processed/served. This will keep the CPUs busy instead of blocking and waiting.

I'm arguing in favor or using something such as the Play! Framework or Vertx.io, something that is non-blocking... Scalable. My team members, on the other hand, are saying that you can get the same benefit by using multiple instances of a Rails app, which out of the box only comes with one thread and doesn't have true concurrency as do apps on the JVM. They are saying just use enough app instances to match the performance of one Play! application (or however many Play! apps we use), and when a Rails app blocks the OS will switch processes to a different Rails app. In the end, they are saying that the CPUs will be doing the same amount of work and we will get the same performance.

What do you think? The marketplace has seemingly moved, in the form of node.js, Golang, Akka, and even Java, to the async server model. Does that mean it's the only right way?

Here's my attempt at a response:

Click to read more ...

Tuesday
Jan142014

Ask HS: Design and Implementation of scalable services?

We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL).

Question(s) I am looking for answer

1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent.

- Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives.

2. Persistence: After some evaluation data to be stored in DBMS System.

- End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option.

Looking if there are any alternatives for above two scenarios and get expert advice.

Monday
Nov112013

Ask HS: What is a good OLAP database choice with node.js?

This question was asked over email and I thought a larger audience might want to take a whack at it.

With a business associate, I am trying to develop a financial software that handles financial reports of listed companies. We managed to create this database with all the data necessary to do financial analysis. My associate is a Business Intelligence specialist so he is keen to use OLAPs databases like Microsoft Analysis Services or Jedox Palo, which enables in-memory calculations and very fast aggregation, slicing and dicing of data or write-backs.

At the same time I did an online course (MOOC) from Stanford CS184 called Startup Engineering which promoted/talked a lot about javascript and especially node.js as the language of the future for servers.

As I am keen to use open-source technologies (would be keen to avoid MS SSAS) for the development of a website to access this financial data , and there are so many choices for databases out there (Postgre, MongoDB, MySQL etc..but don't think they have OLAP features), do you know of resources, blogs, people knowledgeable on the matter, which talk about combination of node.js with OLAP databases? best use of a particular system with node.js?

Thanks for your input.

Monday
Oct072013

Ask HS: Is Microsoft the Right Technology for a Scalable Web-based System?

This question was asked over email and I thought a larger audience might want to take a whack at it.

I have a problem I’d like to have your view on. I’ve looked around a lot, and I haven’t found a definite answer. The question is this:

Is it true that for a scalable web-based system targeting millions of users (hopefully), using Microsoft technology(.Net/SQL Server) over open source technologies like python/ruby/php and mysql (mariadb) / postgresql will cost you more? Is there any justification for paying up for Microsoft licenses(OS, SQL Server, …)?

I am in charge of selecting the technology toolbox for a startup which is going to build a scalable public web platform. I’ve worked as a developer and database developer/admin (mainly as a DBA) using different platforms and technologies, but my main focus is on Microsoft technology. I’ve considered all other important factors for this decision, and at the end, I always come back to the question of money. When I finish developing the first stage of the system, and present it to possible investors to raise money and expand it, will it be a negative point(or even a deal breaker) to have a system developed on top of Microsoft technology stack?

Every time I decide to go with Microsoft, I ask myself “why no other major web-based system (other than stackoverflow) is built on Microsoft technology?”, and I’m back to square one.

Thanks for your time.

On HackerNewsOn Reddit

Monday
Jul152013

Ask HS: What's Wrong with Twitter, Why Isn't One Machine Enough?

Can anyone convincingly explain why properties sporting traffic statistics that may seem in-line with with the capabilities of a single big-iron machine need so many machines in their architecture?

This is a common reaction to architecture profiles on High Scalability: I could do all that on a few machines so they must be doing something really stupid. 

Lo and behold this same reaction also occurred to the article The Architecture Twitter Uses to Deal with 150M Active Users. On Hacker News papsosouid voiced what a lot of people may have been thinking:

I really question the current trend of creating big, complex, fragile architectures to "be able to scale". These numbers are a great example of why, the entire thing could run on a single server, in a very straight forward setup. When you are creating a cluster for scalability, and it has less CPU, RAM and IO than a single server, what are you gaining? They are only doing 6k writes a second for crying out loud.

This is a surprisingly hard reaction to counter convincingly, but nostrademons has a triple great response:

They create big, complex, fragile architectures because they started with simple, off-the-shelf architectures that completely fell over at scale.

 

I dunno how long you've been on HN, but around 2007-2008 there were a bunch of HighScalability articles about Twitter's architecture. Back then it was a pretty standard Rails app where when a Tweet came in, it would do an insert into a (replicated) MySQL database, then at read time it would look up your followers (which I think was cached in memcached) and issue a SELECT for each of their recent tweets (possibly also with some caching). Twitter was down about half the time with the Fail Whale, and there was continuous armchair architects about "Why can't they just do this simple solution and fix it?" The simple solution most often proposed was write-time fanout, basically what this article describes.

Do the math on what a single-server Twitter would require. 150M active users * 800 tweets saved/user * 300 bytes for a tweet = 36T of tweet data. Then you have 300K QPS for timelines, and let's estimate the average user follows 100 people. Say that you represent a user as a pointer to their tweet queue. So when a pageview comes in, you do 100 random-access reads. It's 100 ns per read, you're doing 300K * 100 = 30M reads, and so already you're falling behind by a factor of 3:1. And that's without any computation spent on business logic, generating HTML, sending SMSes, pushing to the firehose, archiving tweets, preventing DOSses, logging, mixing in sponsored tweets, or any of the other activities that Twitter does.

(BTW, estimation interview questions like "How many gas stations are in the U.S?" are routinely mocked on HN, but this comment is a great example why they're important. I just spent 15 minutes taking some numbers from an article and then making reasonable-but-generous estimates of numbers I don't know, to show that a proposed architectural solution won't work. That's opposed to maybe 15 man-months building it. That sort of problem shows up all the time in actual software engineering.)

And the thread goes on with a lot of enlightening details. (Just as an aside, in an interview the question "How many gas stations are in the US" is worse than useless. If someone asked for a Twitter back-of-the-napkin analysis like nostrademons produced, now we are getting somewhere.)

Do you have an answer? Are these kind of architectures evidence of incompetence or is there a method to the madness?

Thursday
Feb072013

Ask HighScalability: Web asset server concept - 3rd party software available?

This article describes the idea of a website asset service for the author's dynamic websites. It deals as a request for comments and also as a request for hints to existing 3rd party software.

Click to read more ...

Thursday
Aug022012

Ask DuckDuckGo: Is there Anything you Want to Know About DDG?

Next week I'm going to have the pleasure of interviewing Gabriel Weinberg, founder of rebel search engine DuckDuckGo. Is there anything you would like to know about DuckDuckGo that I can ask Gabe? Please contact me or comment on this thread with your deepest desires.

Monday
Jul232012

Ask HighScalability: How Do I Build My MegaUpload + Itunes + YouTube Startup?

This question was sent in by Val, who asking for a little help in creating the next big thing. Any ideas?

I'm planning to run my own, first startup website and have been surfing the webs for relevant info to plan the technology I will use for it (the frontend and the backend, including the software and the hardware). The website will be something like a combination of:

  • MegaUpload (users will upload their files)
  • iTunes (users will be paid for their uploads)
  • and YouTube (in the future I'm planning to let users watch/listen to the content online, without downloading).

I don't have any investors yet, nor the budget - I'm still preparing the idea and I'm going to create first implementation (an "alpha version") before I show it to potential investors. Hence the initial technologies have to be extremely cheap *but* also highly scalable in the future so that I don't have to redo anything when the website grows.

Unfortunately I don't have much experience in running big wesites but, on the other hand, I hope my website to grow extremely big (of course).

My questions are:

Click to read more ...