Entries in Paper (127)

Wednesday
Oct032007

Paper: Brewer's Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Services

Abstract: When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

Click to read more ...

Wednesday
Jul252007

Paper: Designing Disaster Tolerant High Availability Clusters

A very detailed (339 pages) paper on how to use HP products to create a highly available cluster. It's somewhat dated and obviously concentrates on HP products, but it is still good information. Table of contents: 1. Disaster Tolerance and Recovery in a Serviceguard Cluster 2. Building an Extended Distance Cluster Using ServiceGuard 3. Designing a Metropolitan Cluster 4. Designing a Continental Cluster 5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 7. Cascading Failover in a Continental Cluster Evaluating the Need for Disaster Tolerance What is a Disaster Tolerant Architecture? Types of Disaster Tolerant Clusters Extended Distance Clusters Metropolitan Cluster Continental Cluster Continental Cluster With Cascading Failover Disaster Tolerant Architecture Guidelines Protecting Nodes through Geographic Dispersion Protecting Data through Replication Using Alternative Power Sources Creating Highly Available Networking Disaster Tolerant Cluster Limitations Managing a Disaster Tolerant Environment Using this Guide with Your Disaster Tolerant Cluster Products 2. Building an Extended Distance Cluster Using ServiceGuard Types of Data Link for Storage and Networking Two Data Center Architecture Two Data Center FibreChannel Implementations Advantages and Disadvantages of a Two-Data-Center Architecture Three Data Center Architectures Rules for Separate Network and Data Links Guidelines on DWDM Links for Network and Data 3. Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Single Data Center Two Data Centers and Third Location with Arbitrator(s) Additional EMC SRDF Configurations Setting up Hardware for 1 by 1 Configurations Setting up Hardware for M by N Configurations Worksheets Disaster Tolerant Checklist Cluster Configuration Worksheet Package Configuration Worksheet Next Steps 4. Designing a Continental Cluster Understanding Continental Cluster Concepts Mutual Recovery Configuration Application Recovery in a Continental Cluster Monitoring over a Wide Area Network Cluster Events Interpreting the Significance of Cluster Events How Notifications Work Alerts Alarms Creating Notifications for Failure Events Creating Notifications for Events that Indicate a Return of Service Performing Cluster Recovery Notes on Packages in a Continental Cluster How Serviceguard commands work in a Continentalcluster Designing a Disaster Tolerant Architecture for use with Continentalclusters Mutual Recovery Serviceguard Clusters Data Replication Highly Available Wide Area Networking Data Center Processes Continentalclusters Worksheets Preparing the Clusters Setting up and Testing Data Replication Configuring a Cluster without Recovery Packages Configuring a Cluster with Recovery Packages Building the Continentalclusters Configuration Preparing Security Files Creating the Monitor Package Editing the Continentalclusters Configuration File Checking and Applying the Continentalclusters Configuration Starting the Continentalclusters Monitor Package Validating the Configuration Documenting the Recovery Procedure Reviewing the Recovery Procedure Testing the Continental Cluster Testing Individual Packages Testing Continentalclusters Operations Switching to the Recovery Packages in Case of Disaster Receiving Notification Verifying that Recovery is Needed Using the Recovery Command to Switch All Packages How the cmrecovercl Command Works Forcing a Package to Start Restoring Disaster Tolerance Restore Clusters to their Original Roles Primary Packages Remain on the Surviving Cluster Primary Packages Remain on the Surviving Cluster using cmswitchconcl Newly Created Cluster Will Run Primary Packages Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups Maintaining a Continental Cluster Adding a Node to a Cluster or Removing a Node from a Cluster Adding a Package to the Continental Cluster Removing a Package from the Continental Cluster Changing Monitoring Definitions Checking the Status of Clusters, Nodes, and Packages Reviewing Messages and Log Files Deleting a Continental Cluster Configuration Renaming a Continental Cluster Checking Java File Versions Next Steps Support for Oracle RAC Instances in a Continentalclusters Environment Configuring the Environment for Continentalclusters to Support Oracle RAC Initial Startup of Oracle RAC Instance in a Continentalclusters Environment Failover of Oracle RAC Instances to the Recovery Site Failback of Oracle RAC Instances After a Failover 5. Building Disaster-Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Files for Integrating XP Disk Arrays with Serviceguard Clusters Overview of Continuous Access XP Concepts PVOLs and SVOLs Device Groups and Fence Levels Creating the Cluster Preparing the Cluster for Data Replication Creating the RAID Manager Configuration Defining Storage Units Configuring Packages for Disaster Recovery Completing and Running a Metrocluster Solution with Continuous Access XP Maintaining a Cluster that uses Metrocluster/CA XP/CA Device Group Monitor Completing and Running a Continental Cluster Solution with Continuous Access XP Setting up a Primary Package on the Primary Cluster Setting up a Recovery Package on the Recovery Cluster Setting up the Continental Cluster Configuration Switching to the Recovery Cluster in Case of Disaster Failback Scenarios Maintaining the Continuous Access XP Data Replication Environment 6. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Files for Integrating ServiceGuard with EMC SRDF Overview of EMC and SRDF Concepts Preparing the Cluster for Data Replication Installing the Necessary Software Building the Symmetrix CLI Database Determining Symmetrix Device Names on Each Node Building a Metrocluster Solution with EMC SRDF Setting up 1 by 1 Configurations Grouping the Symmetrix Devices at Each Data Center Setting up M by N Configurations Configuring Serviceguard Packages for Automatic Disaster Recovery Maintaining a Cluster that Uses Metrocluster/SRDF Managing Business Continuity Volumes R1/R2 Swapping Building a Continental Cluster Solution with EMC SRDF Setting up a Primary Package on the Primary Cluster Setting up a Recovery Package on the Recovery Cluster Setting up the Continental Cluster Configuration Switching to the Recovery Cluster in Case of Disaster Failback Scenarios Maintaining the EMC SRDF Data Replication Environment R1/R2 Swapping 7. Cascading Failover in a Continental Cluster Overview Symmetrix Configuration Using Template Files Data Storage Setup Setting Up Symmetrix Device Groups Setting up Volume Groups Testing the Volume Groups Primary Cluster Package Setup Recovery Cluster Package Setup Continental Cluster Configuration Data Replication Procedures Data Initialization Procedures Data Refresh Procedures in the Steady State Data Replication in Failover and Failback Scenarios

Click to read more ...

Wednesday
Jul252007

Paper: Lightweight Web servers

This paper is a great overview of different lightweight web servers. A lot of websites use lightweight web servers to serve images and static content. YouTube is one example: http://highscalability.com/youtube-architecture. So if you need to improve performance consider changing over a different web server for some types of content. Overview: Recent years have enjoyed a florescence of interesting implementations of Web servers, including lighttpd, litespeed, and mongrel, among others. These Web servers boast different combinations of performance, ease of administration, portability, security, and related values. The following engineering study surveys the field of lightweight Web servers to help you find one likely to meet the technical requirements of your next project. "Lightweight" Web servers like lighttpd, litespeed, and mongrel can offer dramatic benefits for your projects. This article surveys the possibilities and shows how they apply to you. Important dimensions for evaluation of a Web server include: * Performance: How fast does it respond to requests? * Scalability: Does the server continue to behave reliably when many users simultaneously access it? * Security: Does the server do only the operations it should? What support does it offer for authenticating users and encrypting its traffic? Does its use make nearby applications or hosts more vulnerable? * Availability: What are the failure modes and incidences of the server? * Compliance to standards: Does the server respect the pertinent RFCs? * Flexibility: Can the server be tuned to accommodate heavy request loads, or computationally demanding dynamic pages, or expensive authentication, or ...? * Platform requirements: On what range of platforms is the server available? Does it have specific hardware needs? * Manageability: Is the server easy to set up and maintain? Is it compatible with organizational standards for logging, auditing, costing, and so on?

Click to read more ...

Monday
Jul162007

Paper: Guide to Cost-effective Database Scale-Out using MySQL

This paper is behind a registration-wall, you can't do anything on the MySQL site without filling out a form of some kind, but it's a short, decent introduction to using MySQL for a good sized website.

A Quick Hit of What's Inside

Scale-out vs. Scale Up, Customers using MySQL, Scale-Out Reference Architecture

Click to read more ...

Monday
Jul162007

Paper: MySQL Scale-Out by application partitioning

Eventually every database system hit its limits. Especially on the Internet, where you have millions of users which theoretically access your database simultaneously, eventually your IO system will be a bottleneck. [A] promising but more complex solution with nearly no scale-out limits is application partitioning. If and when you get into the top-1000 rank on alexa [1], you have to think about such solutions.

A Quick Hit of What's Inside

Horizontal application partitioning, Vertical application partitioning, Disk IO calculations, How to partition an entity

Click to read more ...

Monday
Jul162007

Paper: The Clustered Storage Revolution

If the clustered file system, clustered storage system, storage virtualization movement is new to you then this is a good intro paper. I's a both vendor puff piece and informative, so it might be worth your time.

A Quick Hit of What's Inside

Clustered storage architectures have the ability to pull together two or more storage devices to behave as a single entity. Clustered storage can be broken down into three types: * 2-way simple failover clustering * Namespace aggregation * Clustered storage with a distributed file systems (DFS)

Click to read more ...

Monday
Jul162007

Paper: Replication Under Scalable Hashing

Replication Under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.

Click to read more ...

Page 1 ... 9 10 11 12 13