Entries by Nati Shalom (27)

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Monday
Nov052012

Are we seeing the renaissance of enterprises in the cloud?

A series of recent surveys on the subject seems to indicate that this is indeed the case:

Research conducted by HPclip_image001 found that the majority of businesses in the EMEA region are planning to move their mission-critical apps to the cloud. Of the 940 respondents, 80 percent revealed plans to move mission-critical apps at some point over the next two to five years.

A more recent survey, by research firm MeriTalkclip_image001[1] and sponsored by VMware and EMC (NYSE:EMCclip_image001[2]), showed that one-third of respondents say they plan to move some mission-critical applications to the cloud in the next year. Within two years, the IT managers said they will move 26 percent of their mission-critical apps to the cloud, and in five years, they expect 44 percent of their mission-critical apps to run in the cloud.

The Challenge - How to Bring Hundreds of Enterprise Apps to the Cloud

The reality is that cloud economics only start making sense when there are true workloads that utilize the cloud infrastructure.

If the large majority of your apps fall outside of this category, then you’re not going to benefit much from the cloud. In fact, you’re probably going to lose money, rather than save money.

The Current Approach

  • Focus on building IaaS - Current cloud strategies of many enterprises has been centered on making the infrastructure cloud ready. This basically means ensuring that they are able to spawn machines more easily than they were before. A quick look at many initiatives of this nature shows that there is still only a small portion of enterprises whose applications run on such new systems.
  • Build a new PaaS - PaaS has been taught as the answer to run apps on the cloud. The reality however, is that most of the existing PaaS solutions only cater to new apps and quite often the small, and “non” mission-critical share of our enterprise applications, which still leaves the majority of our enterprise workload outside of our cloud infrastructure.
  • App Migration as a One Off Project - The other approach for migrating applications to the cloud has been to select a small group of applications, and then migrate these one by one to the cloud. Quite often the thought behind this approach has been that application migration is a one-off project. The reality is that applications are more of a living organism – things fail, are moved, or need to be added and removed over time. Therefore it’s not enough to move apps to the cloud using some sort of virtualization technique, it’s critical that the way they’re run and maintained will also fit the dynamic nature of the cloud.

Why is This not Going to Work?

Simple math shows that if you apply this model to the rest of your apps, it’s probably going to take years of effort to migrate all your apps to the cloud. The cost of doing so is going to be extremely high, not to mention the time to market issue which can be even an even greater risk in the end, as it will reflect on cost of operation, profit margins and even the ability to survive in this an extremely competitive market, if it is too long.

What's missing?

What we’re missing is a simple and systematic way to brings all these hundreds and thousands of apps to the cloud.

Moving Enterprise Workloads to the Cloud at a Massive Scale

Instead of thinking of cloud migration as a one-off thing, we need to think of cloud migration on a massive scale.

Thinking in such terms drives a fairly different approach.

In this post, I outlined what i believe should be the main principles for moving enterprise application at such a scale.

Read full post: http://www.cloudifysource.org/2012/10/30/moving_enterprise_workloads_to_the_cloud_on_a_massive_scale.html

Tuesday
Aug282012

Making Hadoop Run Faster

Making Hadoop Run Faster

One of the challenges in processing data is that the speed at which we can input data is quite often much faster than the speed at which we can process it. This problem becomes even more pronounced in the context of Big Data, where the volume of data keeps on growing, along with a corresponding need for more insights, and thus the need for more complex processing also increases.

Batch Processing to the Rescue

Hadoop was designed to deal with this challenge in the following ways:

1. Use a distributed file system: This enables us to spread the load and grow our system as needed.

2. Optimize for write speed: To enable fast writes the Hadoop architecture was designed so that writes are first logged, and then processed. This enables fairly fast write speeds.

3. Use batch processing (Map/Reduce) to balance the speed for the data feeds with the processing speed.

Batch Processing Challenges

Click to read more ...

Friday
Jun152012

Cloud Bursting between AWS and Rackspace

Cloud bursting is an application deployment model in which an application runs in a private cloud or data center and bursts into a public cloud when the demand for computing capacity spikes. The advantage of such a hybrid cloud deployment is that an organization only pays for extra compute resources when they are needed. ([Definition by SearchCloudComputing])

Neal Sample the former CTO - X.commerce at eBay gave an interesting talk last year on the economic benefit of Cloud Bursting. Neal pointed out eBay traffic statistics and showed some real numbers of the business impact of bursting peak load activities using on-demand cloud resources as presented in the diagram below.

In order to use cloud bursting effectively we need to address the following set of challenges:

Click to read more ...

Thursday
May242012

Build your own twitter like real time analytics - a step by step guide

Major social networking platforms like Facebook and Twitter have developed their own architectures for handling the need for real-time analytics on huge amounts of data. However, not every company has the need or resources to build their own Twitter-like solution.

In this example we have taken the same Twitter/Facebook-like blueprint, and made it simple enough for developers to implement. We have taken the following approach in our implementation: 

  1. Use In Memory Data Grid (XAP) for handling the real time stream data-processing.
  2. BigData data-base (Cassandra) for storing the historical data and manage the trend analytics 
  3. Use Cloudify (cloudifysource.org)  for managing and automating the deployment on private or pubic cloud

The example demonstrate a simple case of word count analytics. It uses Spring Social to plug-in to real twitter feeds. The solution is designed to efficiently cope with getting and processing the large volume of tweets. First, we partition the tweets so that we can process them in parallel, but we have to decide on how to partition them efficiently. Partitioning by user might not be sufficiently balanced, therefore we decided to partition by the tweet ID, which we assume to be globally unique. Then we need persist and process the data with low latency, and for this we store the tweets in memory.

Click to read more ...

Tuesday
Mar272012

Big Data In the Cloud Using Cloudify

Edd Dumbill wrote an interesting article on O’Reilly Radar covering the current solutions for running Big Data in the Cloud

Big data and cloud technology go hand-in-hand. Big data needs clusters of servers for processing, which clouds can readily provide.

Big PaaS

Edd touched briefly on the role of PaaS for delivering Big Data applications in the cloud

Beyond IaaS, several cloud services provide application layer support for big data work. Sometimes referred to as managed solutions, or platform as a service (PaaS), these services remove the need to ucale things such as databases or MapReduce, reducing your workload and maintenance burden. Additionally, PaaS providers can realize great efficiencies by hosting at the application level, and pass those savings on to the customer.

Click to read more ...

Thursday
Nov172011

Five Misconceptions on Cloud Portability

The term "cloud portability" is often considered a synonym for "Cloud API portability," which implies a series of misconceptions.

If we break away from dogma, we can find that what we really looking for in cloud portability is Application portability between clouds which can be a vastly simpler requirement, as we can achieve application portability without settling on a common Cloud API.

In this post i'll be covering five common misconceptions people have WRT to cloud portability.

  1. Cloud portability = Cloud API portability. API portability is easy; cloud API portability is not.
  2. The main incentive for Cloud Portability is - Avoiding Vendor lock-in.Cloud portability is more about business agility than it is about vendor lock-in.
  3. Cloud portability isn’t for startups. Every startup that is expecting rapid growth should re-examine their deployments and plan for cloud portability rather than wait to be forced to make the switch when you are least prepared to do so.
  4. Cloud portability = Compromising on the least common denominator.Application portability doesn't require compromise on the least common denominator as most of the interaction with the cloud API happens outside of our application code anyway, to handle things like provisioning, setup, installation, scaling, monitoring, etc.
  5. The effort for achieving cloud portability far exceed the value. The effort to achieve cloud portability is far less than it used to be, in most cases, making it a greater and more valuable priority (with less investment) than it used to be.

Click to ready more..

Tuesday
Sep062011

Big Data Application Platform

It's time to think of the architecture and application platforms surrounding "Big Data" databases. Big Data is often centered around new database technologies mostly from the emerging NoSQL world. The main challenge that these databases solve is how to handle massive amount of data at a reasonable cost and without poor performanc - distributed databases emerged to address this challenge and today we're seeing high adoption rate and quite impressive success stories such as the Netflix use of Cassandra/DataStax solution. All that indicate the speed in which this market evolves.

The need for a Big Data Application Platform

Click to read more ...

Monday
Jul182011

Building your own Facebook Realtime Analytics System  

Recently, I was reading Todd Hoff's write-up on FaceBook real time analytics system. As usual, Todd did an excellent job in summarizing this video from Engineering Manager at Facebook Alex Himel.

In the first post, I’d like to summarize the case study, and consider some things that weren't mentioned in the summaries. This will lead to an architecture for building your own Realtime Time Analytics for Big-Data that might be easier to implement, using Facebook's experience as a starting point and guide as well as the experience gathered through a recent work with few of GigaSpaces customers. The second post provide a summary of that new approach as well as a pattern and a demo for building your own Real Time Analytics system..

Click to read more ...

Monday
Jun062011

NoSQL Pain? Learn How to Read/write Scale Without a Complete Re-write

Lately I've been reading more cases were different people have started to realize the limitations of the NoSQL promise to database scalability. Note the references below:

Take MongoDB for example. It's damn fast, but it doesn't really know how to save data reliably to disk. I've had it set up in a replica pair to mitigate that risk. Guess what - both servers in the pair failed and corrupted their data files at the same day.

It appears that for many, the switch to NoSQL can be rather painful. IMO that doesn't necessarily mean that NoSQL is wrong in general, but it's a combination of 1) lack of maturity 2) not the right tool for the job.

That brings the question of what's the alternative solution?

In the following post I tried to summarize the lessons from  Ronnie Bodinger (Head of IT at Avanza Bank AB) presentation on how they turned their current read-mostly scale architecture into a complete read/write scale without a complete re-writing of their existing application and while keeping the database as-is.

The lessons learned:

  • Minimize the change by clearly Identifying the scalability hotspots
  • Keep the database as is
  • Put an In Memory Data Grid as a front end to the database
  • Use write-behind to reduce the synchronization overhead
  • Use O/R mapping to map the data back into its original format
  • Use standard Java API and framework to leverage existing skillset
  • Use two parallel (old/new) sites to enable gradual transition
  • Use RAM for high performance access and disk for long term storage
  • Use commodity Database and HW

For a more detailed explanation read more here.