The High Scalable Systems (i.e. Websites) such as: Google, Facebook, Amazon, etc. need high scalable storage system that can deal with huge amount of data with high availability and reliability. Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This presentation explores the landscape of new technologies available today to augment your data layer to improve performance and reliability.

Remember: All of my presentations contents is open source, please feel free to use it, copy it, and re-distribute it as you want.

Download the presentation

HFadeel |

1 Comment |

Permalink |

Print Article

Email Article

Scalability,

key-value store,

scaling storage

Friday

Jul172009

Against all the odds

Friday, July 17, 2009 at 3:18AM

This article not about Mariah Carey, or its song. It's about Storing System, Database.

First let's describe what means by odds: In my social network, I found 93% of the mainstream developers sanctify the database, or at least consider it in any data persistence challenge as the ultimate, superhero, and undefeatable solution.

I think this problem come from the education, personally, and some companies also I think it's involved in this.

To start to fix this bad thinking, we all should agree in the following points:

Every challenge have its own solutions, so whatever you want to save/persistent, there are always many solutions. For example the Web search engines, such as: Google, Kngine, Yahoo, Bing don't use database at all instead we use Indexes (Index file) for better performance.

The Database in general whatever the vendor it's slow compared with other solutions such as: Key-Value storing system, Index file, DHT.

The Database currently employ Relation Data model, or Object relational data model, so don't convince yourself to save non-relation data into relation data model store system such as: Database.

The Database system architecture didn't changed very much in last 30 years, and it's content a lot of limits, and fails in its performance, scalability character. If you don't believe me check out this papers:

The End of an Architectural Era (It's Time for a Complete Rewrite)

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks

I hope if you agreed with me in the previous points. So the question do we really need Database in every application?

There are many scenario shouldn't use Database resisters, such as: Web search engine, Caching, File sharing system, DNS system, etc. In the other hand there many of scenarios should use Database, such as: Customer database, Address book, ERP, etc.

Tiny URL services for example, shouldn't use Database at all because it's require very simple needs, just map a small/tiny URL to the real/big URL. If you start agreed with me, you likely want ask: But what we can use beside or instead of Databases?

There are a lot of tools that fallowing CAP, BASE model, instead of ACID model. But first let's describe ACID:

Atomicity: A transaction is all or nothing

Consistency: Only valid data is written to the database

Isolation: Pretend all transactions are happening serially and the data is correct

Durability: what you write is what you get

The problem with ACID is that it gives you too much; it trips you up when you are trying to scale a system across multiple nodes.

Down time is unacceptable. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures.

To make scalable systems that can handle lots and lots of reads and writes you need many more nodes.

Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed.

In other hand CAP model is about:

Consistency: Your data is correct all the time. What you write is what you read.

Availability: You can read and write and write your data all the time.

Partition Tolerance: If one or more nodes fails the system still works and becomes consistent when the system comes on-line.

CAP is easy to scale, distribute. CAP is scalable by nature.

Everyone who builds big applications builds them on CAP. Who use CAP: Google, Yahoo, Facebook, Kngine, Amazon, eBay, etc.

For example in any in-memory or in-disk caching system you will never need all the Database features. You just need CAP like system. Today there are a lot of: column oriented, and key-value oriented systems. But first let's describe Column oriented:

A column-oriented is a database management system (DBMS) which stores its content by column rather than by row. This has advantages for databases such as data warehouses and library catalogues, where aggregates are computed over large numbers of similar data items. This approach is contrasted with row-oriented databases and with correlation databases, which use a value-based storage structure. For more information check Wikipedia page.

Distributed key-value stores:

Voldemort

Dynomite

Redis

Distributed column stores (Bigtable-like systems):

Cassandra

Hbase

Hypertable

Something a little different:

CouchDB

Resource:

Yahoo! Developer Network Blog -Notes from the NoSQL Meetup

Anti-RDBMS: A list of distributed key-value stores

Database optimization pattern

Facebook's Cassandra - A Massive Distributed Store

Google Code for Cassandra

Architecture Summit 2008

HFadeel |

9 Comments |

Permalink |

Print Article

Email Article

databases,

key-value store

Wednesday

Jul082009

Servers Component - How to choice and build perfect server

Wednesday, July 8, 2009 at 11:19PM

There are a lot of questions about how the server components, and how to build perfect server with consider the power consumption. Today I will discuss the Server components, and how we can choice better server components with consider the power consumption, efficacy, performance, and price.

Key points:

What kind of components the servers needs?
The Green Computing and the Servers components
How much power the server consume
Choice the right components:
- Processor
- Hard Disk Drive
- Memory
- Operating system
Build Server, or buy?

HFadeel |

2 Comments |

Permalink |

Print Article

Email Article

server architecture

Wednesday

Jul082009

Art of Parallelism presentation

Wednesday, July 8, 2009 at 11:12PM

This presentation about parallel computing, and it’s discover the following topic:

What is parallelism?

Why now?

How it’s works?

What is the current options

Parallel Runtime Library. (for more information go there)

Note: All of my presentation is open source, so feel free to copy it, use it, and re-distribute it.
Download

HFadeel |

2 Comments |

Permalink |

Print Article

Email Article

Parallelism

Sunday

May312009

Parallel Programming for real-world

Sunday, May 31, 2009 at 3:42PM

Multicore computers shift the burden of software performance from chip designers and architects to software developers.

What is the parallel Computing ? and what the different between Multi-Threading and Concurrency and Parallelism ? and what is differences between task and data parallel ? and how we can use it ?

Fundamental article into Parallel Programming...

HFadeel |

The Future of the Parallelism and its Challenges

Wednesday, May 27, 2009 at 6:53PM

The Future of the Parallelism and its Challenges

Research and education in Parallel computing technologies is more important than ever. Here I present a perspective on the past contributions, current status, and future direction of the parallelism technologies. While machine power will grow impressively, increased parallelism, rather than clock rate, will be driving force in computing in the foreseeable future. This ongoing shift toward parallel architectural paradigms is one of the greatest challenges for the microprocessor and software industries. In 2005, Justin Ratter, chief technology officer of Intel Corporation, said ‘We are at the cusp of a transition to multicore, multithreaded architectures, and we still have not demonstrated the ease of programming the move will require…’

Key points:

A Little history
Parallelism Challenges
Under the hood, Parallelism Challenges
- Synchronization problems
- CAS problems
The future of the parallelism

Click to read more ...

HFadeel |

1 Comment |

Permalink |

Print Article

Email Article

Parallelism,

parallel

Tuesday

May262009

Database Optimize patterns

Tuesday, May 26, 2009 at 1:17AM

Database Optimize patterns

Most of websites and enterprise application rely on the database backing them to store the application and customer data. So at some point the database could be the main performance and scalability bottleneck for your system performance, so I ‘m here today to cure this! key points:

Database supporters and resisters:
- Database supporters: MySQL, SQL Server, and PostgreSQL
- Database resisters: HBase, MongoDB, Redis, and others
Database Optimizing pattern:
- What to store into the Database?
- Field data types
- The primary key and the indexes
- Data retrieve, SP’s, and Ad-hoc queries
- Caching

Click to read more ...

HFadeel |

1 Comment |

Permalink |

Print Article

Email Article

Database,

optimization

Wednesday

May062009

Art of Distributed

Wednesday, May 6, 2009 at 8:24PM

Art of Distributed

Part 1: Rethinking about distributed computing models

I ‘m getting a lot of questions lately about the distributed computing, especially distributed computing model, and MapReduce, such as: What is MapReduce? Can MapReduce fit in all situations? How we can compares it with other technologies such as Grid Computing? And what is the best solution to our situation? So I decide to write about the distributed computing article in two parts. First one about the distributed computing model and what is the difference between them. In the second part I will discuss the reliability, and distributed storage systems. Download the article in PDF format. Download the article in MS Word format. I wait for your comments, and questions, and I will answer it in part two.

Click to read more ...

HFadeel |

2 Comments |

Permalink |

Wednesday

Apr292009

How to choice and build perfect server

Wednesday, April 29, 2009 at 7:07PM

There are a lot of questions about the server components, and how to choice and/or build perfect server with consider the power consumption. So I decide to write about this topic.

Key Points: