« SharePoint VPS solution | Main | NYTimes Architecture: No Head, No Master, No Single Point of Failure »
Tuesday
Jan142014

Ask HS: Design and Implementation of scalable services?

We have written agents deployed/distributed across the network. Agents sends data every 15 Secs may be even 5 secs. Working on a service/system to which all agent can post data/tuples with marginal payload. Upto 5% drop rate is acceptable. Ultimately the data will be segregated and stored into DBMS System (currently we are using MSQL).

Question(s) I am looking for answer

1. Client/Server Communication: Agent(s) can post data. Status of sending data is not that important. But there is a remote where Agent(s) to be notified if the server side system generates an event based on the data sent.

- Lot of advices from internet suggests using Message Bus (ActiveMQ) for async communication. Multicast and UDP are the alternatives.

2. Persistence: After some evaluation data to be stored in DBMS System.

- End of processing data is an aggregated record for which MySql looks scalable. But on the volume of data is exponential. Considering HBase as an option.

Looking if there are any alternatives for above two scenarios and get expert advice.

Reader Comments (4)

You might want to use flume based agents on every client that is trying to send data and then have flume agent to flume agent communication via avro serialization so that you would have minimal overhead...we have this in production and it is working fine without any issues.

Important thing to keep in mind is that flume will be listening to log file...

January 14, 2014 | Unregistered CommenterAndy

I think this type of questions falls into the category of premature optimization. You are lookng for advice on technology without even knowing your bottlenecks. It reminds me of a quote from a book Carlos Bueno wrote for developers at Facebook


In the early 2000s, I helped build a system for search advertising. We didn’t have a lot of money so we were constantly tweaking the system for more throughput. The former CTO of one of our competitors, looking over our work, noted that we were handling ten times the traffic per server than he had. Unfortunately, we had spent so much time worrying about performance that we didn’t pay enough attention to credit card fraud. Fraud and chargebacks got very bad very quickly, and soon after our company went bankrupt. On one hand, we had pulled off a remarkable engineering feat. On the other hand, we were fixing the wrong problem.[1]

[1] https://www.facebook.com/notes/facebook-engineering/the-mature-optimization-handbook/10151784131623920

January 14, 2014 | Registered CommenterTerrance Shepherd

If you are looking at HBase, then you should also consider Cassandra. That is what we used for web scale persistent notifications and it turned out to scale very well. Choose Cassandra if you are going to be doing more writes than reads.

January 17, 2014 | Unregistered CommenterGlenn

First off Quora is a great forum for these types of questions and based on the number of replies you have gotten it might even get you better feedback.

Second, I would strongly suggest column oriented dbs for data storage. Vertica is the leader in the space IMHO but you can also look at Infobright, MonetDB, LucidDB, InfiniDB, Paracell, or ParStream.

Third, you will certainly want some queuing system for buffering. ActiveMQ is a strong solution in the Java space. If you are in another language RabbitMQ tends to be a favorite. That said, Kafka is worth checking out. It doesn't have all the features of JMS or AMPQ but it has performance in spades.

Finally, I assume you are not running this in the cloud. If you are though you should probably be using SQS and Redshift in AWS potentially with Kinesis. Even if you are not in the cloud you can look to these services for inspiration. You can also look at stream processing platforms like Storm for inspiration.

January 17, 2014 | Unregistered CommenterBen Darfler

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>