High Scalability -

Kristi Anderson |

Ask HS: Design and Implementation of scalable services?

Tuesday, January 14, 2014 at 8:56AM

We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL).

Question(s) I am looking for answer

1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent.

- Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives.

2. Persistence: After some evaluation data to be stored in DBMS System.

- End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option.

Looking if there are any alternatives for above two scenarios and get expert advice.

Nageswara Rao |

4 Comments |

Permalink |

tagged

nosql in

HBase,

Tuesday

Oct092012

Batoo JPA - The new JPA Implementation that runs over 15 times faster...

Tuesday, October 9, 2012 at 9:15AM

This post is by Hasan Ceylan, an Open Source software enthusiast from Istanbul.

I loved the JPA 1.0 back in early 2000s. I started using it together with EJB 3.0 even before the stable releases. I loved it so much that I contributed bits and parts for JBoss 3.x implementations.

Hasan Ceylan |

39 Comments |

Permalink |

tagged

batoo,

hibername,

j2ee,

java,

jpa in

Android,

Performance,

Product,

j2EE

Thursday

Jul022009

Product: Project Voldemort - A Distributed Database

Thursday, July 2, 2009 at 12:02AM

Update: Presentation from the NoSQL conference: slides, video 1, video 2.

Project Voldemort is an open source implementation of the basic parts of Dynamo (Amazon’s Highly Available Key-value Store) distributed key-value storage system. LinkedIn is using it in their production environment for "certain high-scalability storage problems where simple functional partitioning is not sufficient."

From their website:

Data is automatically replicated over multiple servers.

Data is automatically partitioned so each server contains only a subset of the total data

Server failure is handled transparently

Pluggable serialization is supported to allow rich keys and values including lists and tuples with named fields, as well as to integrate with common serialization frameworks like Protocol Buffers, Thrift, and Java Serialization

Data items are versioned to maximize data integrity in failure scenarios without compromising availability of the system

Each node is independent of other nodes with no central point of failure or coordination

Good single node performance: you can expect 10-20k operations per second depending on the machines, the network, and the replication factor

Support for pluggable data placement strategies to support things like distribution across data centers that are geographical far apart.

They also have a nice design page going over some of their architectural choices: key-value store only, no complex queries or joins; consistent hashing is used to assign data to nodes; JSON is used for schema definition; versioning and read-repair for distributed consistency; a strict layered architecture with put, get, and delete as the interface between layers.

Just a hint when naming a project: don't name it after one of the most popular key words in muggledom. The only way someone will find your genius via search is with a dark spell. As I am a Good Witch I couldn't find much on Voldemort in the real world. But the idea is great and is very much in line with current thinking on scalable database design. Worth a look.

The CouchDB Project

Todd Hoff |

7 Comments |

Permalink |

Running multiple processes to understand multicore CPUs power.

key-value store

Monday

Jun292009

Google App Engine plus Amazon AWS: Best of both worlds

Monday, June 29, 2009 at 7:02AM

Google App Engine (GAE) is focused on making development easy, but limits your options. Amazon Web Services is focused on making development flexible, but complicates the development process. Real enterprise applications require both of these paradigms to achieve success… What we really want is the flexibility of AWS and the simplicity of GAE.

For the rest of the post see http://natishalom.typepad.com/nati_shaloms_blog/2009/06/google-app-engine-plus-amazon-aws-best-of-both-worlds.html

natis |

Learn How to Exploit Multiple Cores for Better Performance and Scalability

Tuesday, June 23, 2009 at 4:33AM

InfoQueue has this excellent talk by Brian Goetz on the new features being added to Java SE 7 that will allow programmers to fully exploit our massively multi-processor future. While the talk is about Java it's really more general than that and there's a lot to learn here for everyone.

Brian starts with a short, coherent, and compelling explanation of why programmers can't expect to be saved by ever faster CPUs and why we must learn to exploit the strengths of multiple core computers to make our software go faster.

Some techniques for exploiting multiple cores are given in an equally short, coherent, and compelling explanation of why divide and conquer as the secret to multi-core bliss, fork-join, how the Java approach differs from map-reduce, and lots of other juicy topics.

The multi-core "problem" is only going to get worse. Tilera founder Anant Agarwal estimates by 2017 embedded processors could have 4,096 cores, server CPUs might have 512 cores and desktop chips could use 128 cores. Some disagree saying this is too optimistic, but Agarwal maintains the number of cores will double every 18 months.

An abstract of the talk follows though I would highly recommend watching the whole thing. Brian does a great job.

Why is Parallelism More Important Now?

Coarse grain concurrency was all the rage for Java 5. The hardware reality has changed. The number of cores is increasing so applications must now search for fine grain parallelism (fork-join)

As hardware becomes more parallel, more and more cores, software has to look for techniques to find more and more parallelism to keep the hardware busy.

Clock rates have been increasing exponentially over the last 30 years or so. Allowed programmers to be lazy because a faster processor would be released that saved your butt. There wasn't a need to tune programs.

That wait for faster processor game is up. Around 2003 clock rates stopped increasing. Hit the power wall. Faster processors require more power. Thinner chip conductor lines were required and the thinner lines can't dissipate the increased power without causing overheating which effects the resistance characteristics of the conductors. So you can't keep increasing clock rate.

Fastest Intel CPU 4 or 5 years ago was 3.2 Ghz. Today it's about the same or even slower.

Easier to build 2.6 Ghz or 2.8 Ghz chips. Moore's law wasn't repealed so we can cram more transistors on each wafer. So more processing power could be put on a chip which leads to putting more and more processing cores on a chip. This is multicore.

Multicore systems are the trend. The number of cores will grow at exponential rate for the next 10 years. 4 cores at the low end. The high end 256 (Sun) and 800 (Azul) core systems.

More cores per chip instead of faster chips. Moore's law has been redirected to multicore.

The problem is it's harder to make a program go faster on a multicore system. A faster chip will run your program faster. If you have a 100 cores you program won't go faster unless you explicitly design it to take advantage of those chips.

No free lunch anymore. Must now be able to partition your program so it can run faster by running on multiple cores. And you must be able keep doing that as the number of cores keeps improving.

We need a way to specify programs so they can be made parallel as topologies change by adding more cores.

As hardware evolves platforms must evolve to take advantage of the new hardware. Started off with course grain tasks which was sufficient given the number of cores. This approach won't work as the number cores increase.

Must find finer-grained parallelism. Example sorting and searching data. Opportunities around data. The data can for sorting can be chunked and sorted and the brought together with a merge sort. Searching can be done in parallel by searching subregions of the data and merging the results.

Parallel solutions use more CPU in aggregate because of the coordination needed and that data needs to be handled more than once (merge). But the result is faster because it's done in parallel. This adds business value. Faster is better for humans.

What has Java 7 Added to Support Parallelism?

Example problem is to find the max number from a list.

The course grained threading approach is to use a thread pool, divide up the numbers, and let the task pool compute the sub problems. A shared task pool is slow as the number increases which forces the work to be more course grained. No way to load balance. Code is ugly. Doesn't match the problem well. The runtime is dominated by how long it takes the longest subtask to run. Had to decide up front how many pieces to divide the problem into.

Solution using divide and conquer. Divide set into pieces recursively until the problem is so small the sequential solution is more efficient. Sort the pieces. Merge the results. 0(n log n), but problem is parallelizable. Scales well and can keep many CPUs busy.

Divide and conquer uses fork-join to fork off subtasks and wait for them to complete and then join the results. A typical thread pool solution is not efficient. Creates too many threads and creating threads are expensive and use a lot of memory.

This approach portable because it's abstract. It doesn't know how many processors are available It's independent of the topology.

The fork-join pool is optimized for fine grained operations whereas the thread pool is optimized for course grained operations. Best used for problems without IO. Just computations using CPU that tend to fork off sub problems. Allows data to be shared read-only and used across different computations without copying.

This approach scales nearly linearly with the number of hardware threads.

The goal for fork-join: Avoid context switches; Have as many threads as hardware threads and keep them all busy; Minimize queue lock contention for data structures. Avoid common task queue.

Implementation uses Work-Stealing. Each thread has a work queue that is a double ended queue. Each thread pulls work from the head of queue and processes it. When there's nothing do it steals work from the tail of another queue. No contention for the head because only one thread access it. Rare contention on tail because stealing is infrequent as the stolen work is large which takes them time to process. Process starts with one task. It breaks up the work. Other tasks steal work and start the same process. Load balances without central coordination, few context switches, little coordination.

The same approach also works for graph traversal, matrix operations, linear algebra, modeling, generate moves and evaluate the result. Latent parallelism can be found in a lot of places once you start looking.

Support higher level operations like ParallelArray. Can specify filtering, transformation, and aggregation options. Not a generalized in-memory database, but has a very transparent cost model. It's clear how many parallel operations are happening. Can look at the code and quickly know what's a parallel operation so you will know the cost.

Looks like map reduce except this is scaling across a multicore system, one single JVM, whereas map reduce is across a cluster. The strategy is the same: divide and conquer.

Idea is to make specifying parallel operations so easy you wouldn't even think of the serial approach.

The Free Lunch Is Over - A Fundamental Turn Toward Concurrency in Software By Herb Sutter

Intuition, Performance, and Scale by Dan Pritchett

"Multi-core Mania": A Rebuttal by Ted Neward

CPU designers debate multi-core future by Rick Merritt

Multicore puts screws to parallel-programming models by Rick Merritt

Challenges in Multi-Core Era – Part 1 and Part 2 by Gaston Hillar.

Learning to Program all Over Again by Vineet Gupta

Todd Hoff |

3 Comments |

Permalink |

Parallelism,

Strategy

Thursday

Jun112009

Yahoo! Distribution of Hadoop

Thursday, June 11, 2009 at 3:14PM

Many people in the Apache Hadoop community have asked Yahoo! to publish the version of Apache Hadoop they test and deploy across their large Hadoop clusters. As a service to the Hadoop community, Yahoo is releasing the Yahoo! Distribution of Hadoop -- a source code distribution that is based entirely on code found in the Apache Hadoop project.

This source distribution includes code patches that they have added to improve the stability and performance of their clusters. In all cases, these patches have already been contributed back to Apache, but they may not yet be available in an Apache release of Hadoop.

Read more and get the Hadoop distribution from Yahoo

mg1313 |

Ebay history and architecture

Monday, March 30, 2009 at 11:30PM

Ebay[1] Starts in 1995, initial name AuctionWeb (V1) : - very simple architecture - based on perl - no database, for data persistence they used plain files Because of rapid growth they needed to improve their architecture and so V2 (clever name) was born: - replaced perl with C/C++ - started using a database in a master-slave configuration - C++ back-end - XSLT front-end Any request will lead to an XML file being created in C++ and the XLST processor will transform that into html. *pretty sophisticated architecture for the 90s, XLST was cutting-edge back then* ebay v2 That hold out pretty well for a while but in the late 90s ebay experienced an exponential growth. They started having some trouble with outages and needed improvements, so V3 was developed: - based on java - search engine still used C++ - proof that relational databases can scale (aggressive caching) - developed a messaging layer for making a lot of asyncronious calls, they actually ended up being sued because of the delay in which images appear on the site after an item gets posted :-) - although they switched to java the basic principle of generating xml files for each request was still used. ebay v3 Combining the need for a multilingual website with the boom of AJAX technologies and flash applications they started to doubt their XSL system and moved on to what became in 2006 V4 of ebay: - remove everything that could be replaced with java, Ebay loves Java Everything is Java (Code): - Image - Java class - Link - Java class - Javascript - Java classes - Content - Java classes => lot's of code to write, they're using Eclipse for developing. ebay v4 For more details check: Eclipse at Ebay Tailoring Eclipse to the eBay architecture [1] Images and ideas/info are from the links above.

Marcelb |

14 Comments |

Permalink |

AJAX,

Perl,

ebay,

xslt

Tuesday

Mar172009

IBM WebSphere eXtreme Scale (IMDG)

Tuesday, March 17, 2009 at 11:08PM

IBM WebSphere eXtreme Scale is IBMs in memory data grid product (IMDG). It can be used as a key-value store which partitions the keys (using a form of consistent hashing) over a set of servers such that each server is responsible for a subset of the keys. It automatically handles replication which can be either synchronous of asynchronous and handles advanced placement so that replicas can be placed in different physical zones when compared to the placement of the primary. Think buildings, racks, floor, data centers. It is fully elastic in that servers can be added and removed and it automatically redistributes the partition primaries and backups. It can be scaled from one server to hundreds if not thousands of JVMs in a single grid. Each additional server provides more CPU, memory capacity and network and it scales linearly with grid growth. It also has a key-graph mode where a graph of objects can be associated with a single key and it allows fine grained modification of that graph. The object graph and key is stored in tuple form in this mode. This allows clients using different object representations of some subset of the IMDG schema to share data stored in the IMDG. It comes with automatic integration with databases so that values are automatically pulled from a database if not present and are written to the database when they change. Write behind logic allows writes to the database to be much more efficient and allows the grid to run with the database down. It comes with a HTTP Session filter to provide HTTP Session management for servlet containers. It have a flexible deployment model allowing a lot of customization by customers. We do a weekly video podcast on iTunes (search for extreme scale in iTunes) and make it available on YouTube also for customer education. We answer customer questions and forum topics from the week in a casual two person chat forum.

bnewport |

4 Comments |

Permalink |

DHT,

P2P,

consistent hashing,

datagrid,

distributed systems,

ibm,

imdg

Wednesday

Mar112009

Sharding and Connection Pools

Wednesday, March 11, 2009 at 10:14PM

Hi we are looking at sharding our existing Java/Oracle based application. We are looking to make the app servers able to process requests for multiple (any?) shard. The concern that has come up is the amount of memory that would be consumed by having so many connection pools on one app server. Additionally there is concern about having so many physical connections to the database server coming from all the various app servers that may talk to that particular shard. I was wondering if anyone else has dealt with this issue and how you resolved it? Thanks, Scott

smitchelus |

1 Comment |

Permalink |