Entries by HighScalability Team (1576)

Monday
Oct152007

Olympic Site Architecture

Hello,everybody,I'm plant to building a new website like 2008.sina.com.cn 2008.sohu.com .site contents have pic news,text news,and video news.user blog ....now I have a question to ask everybody,I hope can get usefully information to here. status: 100,000,000 people /per day 50,000 people /peak time more than 200 servers OpenBSD/Opensuse Apache Fast CGI modules lighttpd for picture Mysql varnish LVS lucene search do you have a good idea to it?thans for everybody!

Click to read more ...

Tuesday
Oct092007

High Load on production Webservers after Sourcecode sync

Hi everybody :) We have a bunch of webservers (about 14 at this time) running Apache. As application framework we're using PHP with the APC Cache installed to improve performance. For load balancing we're using a Big F5 system with dynamic ratio (SNMP driven) To sync new/updated sourcecode we're using subversion to "automaticly" update these servers with our latest software relases. After updating the new source to these production servers the load of the mashines is raising to hell. While updating the servers, they are still in "production", serving webpages to the users. Otherwise the process of updating would take ages. Most of the time we're only updating in the morning hours while less users are online, because of the above issue. My guess is, that the load is raising that high, because APC needs to recompile a bunch of new files each time. Before and while compiling the performance simply is "bad". My goal would be to find a better solution. We want to "sync" code no matter how many users are online (in case of emergency) without taking the whole site down. How you're handling this issues ? What do you think about the process above ? Do you may find the "problem" ? Do you have similiar issues ? Feedback is highly welcome :) Greetings, Stephan Tijink Head of Web Development | fotocommunity GmbH & Co. KG | Rheinwerkallee 2 | 53227 Bonn

Click to read more ...

Wednesday
Oct032007

Why most large-scale Web sites are not written in Java

There is a lot of information in the blogosphere describing the architecture of many popular sites, such as Google, Amazon, eBay, LinkedIn, TypePad, WikiPedia and others. I've summarized this issue in a blog post here I would really appreciate your opinion on this matter.

Click to read more ...

Monday
Oct012007

Statistics Logging Scalability

My company is developing a centralized web platform to service our clients. We currently use about 3Mb/s on our uplink at our ISP serving web pages for about 100 clients. We'd like to offer them statistics that mean something to their businesses and have been contemplating writing our own statistics code to handle the task. All statistics would be gathered at the page view level and we're implementing a HttpModule in ASP.Net 2.0 to handle the gather of the data. That said, I'm curious to hear comments on writing this data (~500 bytes of log data/page request). We need to write this data somewhere and then build a process to aggregate the data into a warehouse application used in our reporting system. Google Analytics is out of the question because we do not want our hosting infrastructure dependant upon a remote server. Web Trends et al. are too expensive for our clients. I'm thinking of a couple of options. 1) Writing log data directly to a SQL Server 2000 db and having a Windows Service come in periodically to summarize and aggregate the data to the reporting server. I'm not sure this will scale with higher load and that the aggregation process will timeout because of the number of inserts being sent to the table. 2) Write the log data to a structure in memory on the web server and periodically flush the data to the db. The fear here is that the web server goes down and we lose all the data in memory. Other fears are that the IIS processes and worker threads might mangle one another when contending for the memory system resource. 3) Don't use memory and write to a file instead. Save the file handler as an application variable and use it for all accesses to the file. Not sure about threading issues here as well and am reluctant to use anything which might corrupt a log file under load. 4) Add comment data to the IIS logs. This theoretically should remove the threading issues but leaves me to think that the data would not be terribly useful once its in the IIS logs. The major driver here is that we do not want to use any of the web sites and canned reports built into 90% of all statistics platforms. Our users shouldn't have to "leave" the customer care portal we're creating just to see stats for their sites. IFrames are not an option. I'm looking for a solution that's not entirely complex, nor is it overly expensive and it will give me the access to the data we need to record on page views. It has to scale with volume. Thoughts are appreciated. Derek

Click to read more ...

Sunday
Sep232007

HA for switches

Hi, Can someone teach me how you implement network switch fail over since we are paranoid for single point of failure. For example, you have: a dozen web servers -> switch -> DB cluster that switch is a SPOF. How does one implement dual switch in a fail over fashion?

Click to read more ...

Sunday
Sep162007

What software runs on this site?

It's pretty slick! olla

Click to read more ...

Saturday
Sep152007

The Role of Memory within Web 2.0 Architectures and Deployments

Although I have a basic working knowledge of memory, SSDs and the like, I am not technical...I have never developed or deployed a system. I was exposed to ram-disks years ago, when their expense limited their use to very small files or DB applications. I am looking to "get current" on what role memory plays in curremt WEB 2.0 design and deployments. How is memory commonly used to remove latency and accelerate performance in typical Web 2.0 architectures? What role can memory play in massive scale-out implementations? Are there such a thing as memory "best practives"? If memory were cheap, would that significantly change the way systems are designed and deployed? What commercial and open source products that use memory are used, what are the benefits and trade-offs? Can anyone suggest what sources - people, books, papers, products - I might look into to gain a practical understanding of this topic?

Click to read more ...

Thursday
Sep132007

Design Preparations for Scaling

Hi there, what do you think is crucial in the code designing of a scalable site? How does one prepare for webfarms and clusters (e.g. in PHP)? Thanks, Stephan

Click to read more ...

Wednesday
Sep122007

Technology behind mediatemple grid service 

Anyone knows what's behind this service? http://www.mediatemple.net/webhosting/gs/ thanks!

Click to read more ...

Sunday
Sep092007

Clustering Solution

Hi, I'm interested in peoples thoughts on the best choice for a database clustering solution. I have a database that is mostly varchars and numbers that doesn't store any binary data at all. It's used at about 70% read and 30% writes - though we're using memcached at the moment so it's not really hit that hard. We're currently using mysql with m/cluster, but are interested in a new solution. Possible candidate so far are unicluster (which doesn't seem mature yet.) or DRBD. Had anyone had a similar experience and can make any suggestions? Thanks

Click to read more ...