High Scalability -

Entries by HighScalability Team (1576)

Tuesday

Jan262010

Product: HyperGraphDB - A Graph Database

Tuesday, January 26, 2010 at 11:30AM

With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB, in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in:Java, Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web.

So it has some interesting features, like software transactional memory and P2P for data distribution, but I found that my first and most obvious question was not answered: what the heck is a hypergraph and why do I care? Buried in the tutorial was:

A HyperGraphDB database is a generalized graph of entities. The generalization is two-fold:

Links/edges "point to" an arbitrary number of elements instead of just two as in regular graphs

Links can be pointed to by other links as well.

OK, but I wish there was some explanation of why this is valuable. What can I do with it that I can't do with normal graphs? Given that there have been concerns over the complexity of the API this would seem a natural topic to cover. I assume it's cool, it sounds cool, but I would like to know why :-)

In any case it looks like an interesting product to take a look at. Database options are expanding fast.

Click to read more ...

HighScalability Team |

10 Comments |

Permalink |

Print Article

Email Article

Product,

graph

Monday

Jan252010

Let's Welcome our Neo-Feudal Overlords

Monday, January 25, 2010 at 8:23AM

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

There's a pattern, already begun, that has accelerated by the need for applications to scale and increase complexity, the end result of which will be that applications give up their independence and enter a kind of feudal relationship with their platform provider.

To understand how this process works, like a glacier slowly and inevitably carving out a deep river valley, here's the type of question I get quite a lot:

I've learned PHP and MySQL and I've built a web app that I HOPE will receive traffic comparable to eBay's with a similar database structure. I see all these different opinions and different techniques and languages being recommended and it's so confusing. All I want is perhaps one book or one website that focuses on PHP and MySQL and building a large database web app like eBay's. Does something like this exist?

I'm always at a loss for words with these questions. What can I possibly say?

Click to read more ...

HighScalability Team |

8 Comments |

Permalink |

Print Article

Email Article

Friday

Jan222010

How BuddyPoke Scales on Facebook Using Google App Engine

Friday, January 22, 2010 at 7:47AM

How do you scale a viral Facebook app that has skyrocketed to a mind boggling 65 million installs (the population of France)? That's the fortunate problem BuddyPoke co-founder Dave Westwood has and he talked about his solution at Wednesday's Facebook Meetup. Slides for the complete talk are here. For those not quite sure what BuddyPoke is, it's a social network application that lets users show their mood, hug, kiss, and poke their friends through on-line avatars.

In many ways BuddyPoke is the quintessentially modern web application. It thrives off the energy of social network driven ecosystems. Game play mechanics, viral loops, and creative monetization strategies are all part of if its everyday conceptualization. It mashes together different technologies, not in a dark Frankensteining sort of way, but in a smart way that gets the most bang for the buck. Part of it runs on Facebook servers (free). Part of it runs on flash in a browser (free). Part of it runs on a storage cloud (higher cost). And part of runs on a Platform as a Service environment (that's GAE) (low cost). It also integrates tightly with other services like PayPal (a slice). Real $$$ are made selling virtual goods like gold coins redeemable in pokes. User's can also have their avatars made into dolls, t-shirts, and a whole army of other Zazzle powered gifts.

Click to read more ...

HighScalability Team |

9 Comments |

Permalink |

Print Article

Email Article

Example,

GAE,

facebook,

google

Sunday

Jan172010

Applications Become Black Boxes Using Markets to Scale and Control Costs

Sunday, January 17, 2010 at 8:56AM

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

We tend to think compute of resources as residing primarily in datacenters. Given the fast pace of innovation we will likely see compute resources become pervasive. Some will reside in datacenters, but compute resources can be anywhere, not just in the datacenter, we'll actually see the bulk of compute resources live outside of datacenters in the future.

Given the diversity of compute resources it's reasonable to assume they won't be homogeneous or conform to a standard API. They will specialize by service. Programmers will have to use those specialized service interfaces to build applications that are adaptive enough to take advantage of whatever leverage they can find, whenever and wherever they can find it. Once found the application will have to reorganize on the fly to use whatever new resources it has found and let go of whatever resources it doesn't have access to anymore.

If, for example, high security is required for a certain operation then that computation will need to flow to a specialized security cloud. If memory has gone on auction and a good deal was negotiated then the software will have to adapt to take advantage. If the application has a large computation it needs to carry out then it will need to find and make use of the cheapest CPU units it can find at that time. If the latency on certain network routes has reached a threshold the the application must reconfigure itself to use a more reliable, lower latency setup. If a new cheap storage cloud has come on line then the calculation will need to be made if it's worth redirecting new storage to that site. If a new calendar service offers an advantage then move over to that. If a new Smart Meter service promises to be a little smarter then go with the higher IQ. If a new Personal Car Navigation Service offers better, safer routes for less money then redirect. And so on.

In short, it's a market driven approach, mediated by service APIs, controlled by applications. Currently this is not how the world works at all. Currently applications resemble a top down hierarchically driven economy. Applications are built for a specific environment: platform, infrastructure, network, APIs, management, upgrade, job scheduling, queuing, backup, high availability, monitoring, billing, etc. Moving an application outside that relatively fixed relationship is very difficult and rarely done. For that reason talking about more fluid applications may seem a bit like crazy talk.

Click to read more ...

HighScalability Team |

3 Comments |

Permalink |

Print Article

Email Article

Wednesday

Jan132010

10 Hot Scalability Links for January 13, 2010

Wednesday, January 13, 2010 at 7:31AM

Has Amazon EC2 become over subscribed? by Alan Williamson. Systemic problems hit AWS as users experience problems across Amazon's infrastructure. It seems the strange attractor of a cloud may be the same as for a shared hosting service.

Understanding Infrastructure 2.0 by James Urquhart. We need to take a systems view of our entire infrastructure, and build our automation around the end-to-end architecture of that system.

Hey You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. We show that it is possible to map the internal cloud infrastructure.

Hadoop World: Building Data Intensive Apps with Hadoop and EC2 by Pete Skomoroch. Dives into detail about how he built TrendingTopics.org using Hadoop and EC2.

A Crash Course in Modern Hardware by Cliff Click. Yes, your mind will hurt after watching this. And no, you probably don't know what your microprocessor is doing anymore.

Click to read more ...

HighScalability Team |

Strategy: Don't Use Polling for Real-time Feeds

Monday, January 11, 2010 at 10:48AM

Ivan Zuzak wrote a fascinating article on Real-time feed processing and filtering using Google App Engine to build Feed-buster, a service that inserts MediaRSS tags into feeds that don't have them. He talks about using polling and PubSubHubBub (real-time) to process FriendFeed feeds. Ivan is trying to devise a separate filtering service where:

filtering services should be applied as close to the publisher as possible so notifications that nobody wants don’t waste network resource.
processing services should be applied as close to the subscriber so that the original update may be transported through the network as a single notification for as long as possible.

Besides being a generally interesting article, Ivan makes an insightful observation on the nature of using polling services in combination with metered Infrastructure/Platform services:

Polling is bad because AppEngine applications have a fixed free daily quota for consumed resources, when the number of feeds the service processed increased - the daily quota was exhausted before the end of the day because FF polls the service for each feed every 45 minutes.

This fits directly in with the ideas in Cloud Programming Directly Feeds Cost Allocation Back into Software Design. My general preference is to poll a distributed queue for work items. It's robust and allows your system to control it's own resource usage by determining when to poll. Otherwise you can easily be overwhelmed by fast pushers. Here the overwhelming is going the other way. Your budget is being overwhelmed by the polling requests. And the more you try approximate real-time with frequent polling requests the more your budget is busted.

It's a cool example of how costs, algorithm, and platform choices all feed into and shape product architectures.

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

Strategy,

google,

real time

Monday

Jan112010

Have We Reached the End of Scaling?

Monday, January 11, 2010 at 9:31AM

This is an excerpt from my article Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud.

Have we reached the end of scaling? That's what I asked myself one day after noticing a bunch of "The End of" headlines. We've reached The End of History because the Western liberal democracy is the "end point of humanity's sociocultural evolution and the final form of human government." We've reached The End of Science because of the "fact that there aren't going to be any obvious, cataclysmic revolutions." We've even reached The End of Theory because all answers can be found in the continuous stream of data we're collecting. And doesn't always seem like we're at The End of the World?

Motivated by the prospect of everything ending, I began to wonder: have we really reached The End of Scaling?

Click to read more ...

HighScalability Team |

3 Comments |

Permalink |

Print Article

Email Article

Monday

Jan042010

11 Strategies to Rock Your Startup’s Scalability in 2010

Monday, January 4, 2010 at 6:19AM

This is a guest posting by Marty Abbott and Michael Fisher, authors of The Art of Scalability. I'm still reading their book and will have an interview with them a little later.

If 2010 is the year that you’ve decided to kickoff your startup or if you’ve already got something off the ground and are expecting double or triple digit growth, this list is for you. We all want the attention of users to achieve viral growth but as many can attest, too much attention can bring a startup to its knees. If you’ve used Twitter for any amount of time you’re sure to have seen the “Fail Whale”, which is so often seen that it has its own fan club. Take a look at the graph below from Compete.com showing Twitter’s unique visitors. One can argue that limitations in the product offering have as much to do with the flattening of growth over the past six months as does the availability, but it’s hard to believe the inability of users to actually use the service has not hindered growth.

What should you do if you want your startup to scale with double and triple digit growth? We’ve put together a list of 11 strategies that will aid in your quest for scalability. In our recently released book “The Art of Scalability” you will find more details about these and other strategies.

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Monday

Dec282009

Zynga Needs a Server-side Systems Engineer

Monday, December 28, 2009 at 9:06AM

Ashleigh Anderson from Zynga let me know that they have an opening for a Systems Engineer working on some new games they are developing. Given the state of the job market I thought it worth posting. Here are more details...

Click to read more ...

HighScalability Team |

10 Comments |

Permalink |

Print Article

Email Article

jobs

Monday

Dec212009

Hot Holiday Scalability Links for 2009

Monday, December 21, 2009 at 7:25AM

Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. The only independent platform most of us will have access to capable of hosting planet-scale applications is the Ambient Cloud. It forms a sort of digital potluck where everyone contributes memory, network, and other compute resources from whatever they happen to have available.
Top 10 Internet Startup Scalability Killers. Strategies taken from The Art of Scalability. 1. Thinking Scalability Is Just About Technology; 2. Overuse of Synchronous Calls; 3. Failure to Weed or Seed Soon Enough; 4. Inappropriate Use of Databases; 5. Cesspools Instead of Swim Lanes; 6. Reliance on Vertical Scale; 7. Failure to Learn from History; 8. Changing Development Methodologies to Fix Problems; 9. Too Little Caching, Too Late; 10. Overreliance on Third Parties to Scale.
The New Google: Internet Giant Opens Up About Real-Time and Local Search, Cloud Computing, and Data Liberation. In four separate interviews, Google delved into some of the most important topics of the day, from its advances in real-time and local search to cloud computing and a “data.. liberation” effort to help consumers export their files and digital information from Google products
Ask HN: What are the best technologies you've worked with this year? Quite a nice variety, no clear winner. Some honorable mentions: Django, Redis, Clojure, XMPP, Node.js, AMQP, Rails, jQuery, Solr, Hadoop.
Why I think Mongo is to Databases what Rails was to Frameworks. We have been amazed at how much code we cut out of Harmony with the switch from MySQL to Mongo.
A Deluge of Data Shapes a New Era in Computing. Dr. Gray called the shift a “fourth paradigm.” The first three paradigms were experimental, theoretical and, more recently, computational science. He explained this paradigm as an evolving era in which an “exaflood” of observational data was threatening to overwhelm scientists.
MySpace Replaces Storage with Solid-State Drive Technology in 150 Standard Load Servers. Using SSD reduced the headcount of their heavy load servers from 80 to 30.
Amazon's CloudFront Now Offers Flash Streaming, This Will Disrupt The Market. Amazon does have the potential to take many of the mid-sized customers who spend between $3-5k a month on video.
How would you design an AppEngine datastore for a social site like Twitter? Using Jaiku's reimplementation on Google App Engine is a good reference.
Query Processing for NOSQL DB. It seems to me that the responsibility of building an indexing and query mechanism lands on the NoSQL user.
Persistent Trees in git, Clojure and CouchDB. There are some really neat software projects emerging at the moment, and as a developer I always find it interesting to take a look at the implementation details, because there is often a lot to be learned.
Transcendent Memory. Transcendent memory is a new memory-management technique which, it is hoped, will improve the system's use of scarce RAM, regardless of whether virtualization is being used.
Trading Shares in Milliseconds. By the end of the day, his computers will have bought and sold about 60 million to 80 million shares.

HighScalability Team |