Thursday
Oct302008
The case for functional decomposition

Hi all,
I'm a big fan of http://highscalability.com/ and have been looking in my current development to decompose my application along functional boundaries as a route to being able to scale out the server side, specifically the database layer. The problem comes when there are links between the data in different components, ie one component holds all the user data, but another component needs to reference a user as being an owner of some piece of data. I'm currently doing this by holding the primary key information for each side of the the link (as you would if they all lived in a single database), but this link table needs to exist in both components to allow lookups to be done in either direction, ie 'get the things a specific user owns' and 'get the owners of this specific thing' would each use different components. The alternative to this would be to store the link data in only one of the components, but then the reverse lookups would require 2 calls instead of just one.
My question is this, is the duplication of these link tables some kind of code smell I should be avoiding or is this just the way things go when you split your app along functional lines like this?
Is this sort of approach really applicable to anyone other than the ebays of this world? should the rest of us just keep putting more functionality into the same back end?
Cheers,
Robin
I'm a big fan of http://highscalability.com/ and have been looking in my current development to decompose my application along functional boundaries as a route to being able to scale out the server side, specifically the database layer. The problem comes when there are links between the data in different components, ie one component holds all the user data, but another component needs to reference a user as being an owner of some piece of data. I'm currently doing this by holding the primary key information for each side of the the link (as you would if they all lived in a single database), but this link table needs to exist in both components to allow lookups to be done in either direction, ie 'get the things a specific user owns' and 'get the owners of this specific thing' would each use different components. The alternative to this would be to store the link data in only one of the components, but then the reverse lookups would require 2 calls instead of just one.
My question is this, is the duplication of these link tables some kind of code smell I should be avoiding or is this just the way things go when you split your app along functional lines like this?
Is this sort of approach really applicable to anyone other than the ebays of this world? should the rest of us just keep putting more functionality into the same back end?
Cheers,
Robin
Reader Comments (4)
Hi Robin, I think from the experience of ebay (http://highscalability.com/ebay-architecture) and Flickr (http://highscalability.com/flickr-architecture) the increased complexity is just the cost of partitioning to scale. If you don't need to I would still scale up, cache, and perform other optimizations for as long as possible. The next step is a doozy.
I dont know what kind of advice I can give you. The only thing that I can assert that the Internet is like alcohol in some sense. It accentuates what you would do anyway. If you want to be a loner, you can be more alone. If you want to connect, it makes it easier to connect.
Dont spend all your time here. Think about anything else!
Really, the question is, what's the change rate on these tables?
What are the potential consequences if they aren't synchronized?
Costs of synchronization?
Is one the master, is the other a cache? Or do they both have changes that need propagation?
Ok, that was more than one question. But they are all still important to know.
I believe the question you need to think hard and long about when designing distributed systems is who actually owns the data.
If a user owns a piece of data, chances are that you want the user's data to reside on the same partition as the actual user. At least when working on systems consisting of several separate subsystems, I've found this to be a rule to keep in the back of ones head.
I've never seen an actual sharded system in practice, but I'd assume that the sharding strategy for finding this data would be a lookup on which shard the actual user resides on, then lookup the actual data from that shard. Maybe someone more experienced in sharding could confirm or deny my claims.