High Scalability -

Entries in General Discussion (161)

Wednesday

Aug272008

Updating distributed web applications

Wednesday, August 27, 2008 at 2:09AM

Hi, we've got a web application, which runs without the common standalone application servers like tomcat or jboss, rather it runs with an embedded jetty server. Now we are planing to run instances of this application on multiple machines, with a load balancer serving the requests. The big question is: is there a common scenario on how to update these applications? Lets think of 10 instances on 10 machines (one instance per machine), where we want to update each of these applications version. The brute force approach would be, to stop all instances, update and then restart it. This is a lot of manual work ;) Another problem is down-time: so someone must only shutdown one server after another, but then there are multiple application versions around. Can someone please provide us with a hint for this problem? Perhaps papers, tools or something like that? Thanks a lot :)

Click to read more ...

Mork0075 |

3 Comments |

Permalink |

Print Article

Email Article

General Discussion

Monday

Aug182008

Forum sort order

Monday, August 18, 2008 at 7:10PM

G'day, I noticed the default sort order for the forum is to show the posts with the most replies first. That seems a bit odd for a forum. Would it not make sense to show the posts with the most recently replies first? It is possible to re-sort the forum threads that way by clicking on the "Last post" header (twice). It would seem like a more sensible default. I've checked and I see the same behaviour as both a registered (logged in) and anonymous user. Cheers - Callum.

Click to read more ...

chmac |

3 Comments |

Permalink |

Print Article

Email Article

General Discussion

Monday

Aug182008

Code deployment tools

Monday, August 18, 2008 at 7:02PM

G'day, I'm building an application to manage WordPress PHP code on many servers. Our application will push down code updates to each server, as well as performing backups and testing. I'm considering different methods of pushing updated code onto the individual servers. I'm considering something like Capistrano (I've no experience in Ruby though). I've also considered using subversion and then remotely calling svn commands via SSH. Are there any other tools specifically for this purpose? The servers will have persistent data (the WordPress databases) so I don't want to re-image them every update. Plus, they will each have a different set of plugins / themes, so building many images would be too complex. If there are any papers on code deployment, or other recommended reading, please point the links my way. Likewise, if anyone has any suggestions, or would like more details, just let me know. Cheers - Callum.

Click to read more ...

chmac |

4 Comments |

Permalink |

Print Article

Email Article

General Discussion,

code deployment,

deployment

Sunday

Aug172008

Many updates against MySQL

Sunday, August 17, 2008 at 7:27AM

Hello! My first post here, so be patient please. I am developing site where I have lots of static content. But on many pages I have query to update count of views. I would say this is may cause lots of problems and was interested in another solution like storing these counts somewhere else. As my knowledge is bit limited in this way, I am asking you. I can say I understand PHP(OOP ofc) and MySQL. Nowadays I am getting into servers. Other question I have is: I read about making lots of things static.(in Flickr Architecture) and am interested how they do static sites? Lets say they make photo page static? And rebuild when tagg or comment is added? I am bit interested in it as I want to learn Smarty better(newbie) and serving content. Moreover, how about PHP? I have read many books about PHP theoretically but would love to see some RL example of using objects and exceptions(mainly this as I don't completely understand it) to learn some good programming habits. So if you can help me with some example or resource, please do :) I know I've covered huge area of things but these are what makes me mad everyday. So please be patient :) Greetings.

Click to read more ...

JAM3SoN |

7 Comments |

Permalink |

Print Article

Email Article

General Discussion,

mysql update frequent

Friday

Aug082008

Separation into read/write only databases

Friday, August 8, 2008 at 3:00PM

At least in the articles on Plenty of Fish and Slashdot it was mentioned that one can achieve higher performance by creating read-only and write-only databases where possible. I have read the comments and tried unsuccessfully to find more information on the net about this. I still do not understand the concept. Can someone explain it in more detail, as well as recommend resources for further investigation? (Are there books written specifically about this technique?) I think it is a very important issue, because databases are oftentimes the bottleneck.

Click to read more ...

mofey |

2 Comments |

Permalink |

Saturday

Jun282008

ID generation schemes

Saturday, June 28, 2008 at 5:08PM

Hi, Generating unique ids is a common requirements in many projects. Generally, this responsibility is given to Database layer. By using sequences or some other technique. This is a problem for horizontal scalability. What are the Guid generation schemes used in high scalable web sites generally? I have seen use java's SecureRandom class to generate Guid. What are the other methods generally used? Thanks Unmesh

Click to read more ...

unmesh |

3 Comments |

Permalink |

Print Article

Email Article

General Discussion

Sunday

Jun082008

Search fast in million rows

Sunday, June 8, 2008 at 5:10PM

I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.

Click to read more ...

d17may |

6 Comments |

Permalink |

Print Article

Email Article

General Discussion,

search,

Total Cost of Ownership for different web development frameworks

Monday, June 2, 2008 at 6:19AM

I would like to compile a comparison matrix on the total cost of ownership for .Net, Java, Lamp & Rails. Where should I start? Has anyone seen or know of a recent study on this subject?

Click to read more ...

Biff |

memcached and Storage of Friend list

Saturday, May 31, 2008 at 1:49PM

My first post, please be gentle. I know it is long. You are all like doctors - the more info, the better the diagnosis. ----------- What is the best way to store a list of all of your friends in the memcached cache (a simple boolean saying “yes this user is your friend”, or “no”)? Think Robert Scoble (26,000+ “friends”) on Twitter.com. He views a list of ALL existing users, and in this list, his friends are highlighted. I came up with 4 possible methods: --store in memcache as an array, a list of all the "yes" friend ID's --store your friend ID's as individual elements. --store as a hash of arrays based on last 3 digits of friend's ID -- so have up to 1000 arrays for you. --comma-delimited string of ID's as one element I'm using the second one because I think it is faster to update. The single array or hash of arrays feels like too much overhead calculating and updating – and even just loading – to check for existence of a friend. The key is FRIEND[small ID#]_[big ID#]. The value is 1. This way there are no dupes. (I add u as friend, it always adds me as ur friend...I remove u, u remove me). Store with it 2 additional flags: One denotes start of entries. One denotes end of entries. As friends are added, the end flag position relative to new friends will become meaningless, but that is ok (I think). To see if someone is your friend, the system checks if both start and end flags exist. If both exist, it can check for existence of friend ID - if exists, then friend. Start flag is required. If start flag is pushed out of cache, we must assume some friends were also pushed out. Currently, the system loads from DB in a daemon in the background after you log in (if two flags are not already set). Until the two flags are set, it does db lookups. There is no timeout on the data in cache. Adding/removing friends to your account adds/removes to/from memcache - so, theoretically, it might never have to pre-load anything. Downside of my method is if the elements span multiple servers and one dies, you loose some of your friends (that's the upside of using arrays). I don't know how to resolve if the lost box didn't contain either of the flags -- in that case, the users' info will NEVER get refreshed. This is my concern. Any ideas? Thanks so much!!!

Click to read more ...

FredMeyers |

5 Comments |

Permalink |

Print Article

Email Article

General Discussion,

Memcached

Wednesday

May282008

Job queue and search engine

Wednesday, May 28, 2008 at 7:15PM

Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat

Click to read more ...

mat |

3 Comments |

Permalink |