Entries in General Discussion (161)

Thursday
Aug232007

Postgresql on high availability websites?

I was looking at the pingdom infrastructure matrix (http://royal.pingdom.com/royalfiles/0702_infrastructure_matrix.pdf) and I saw that no sites are using Postgresql, and then I searched through highscalability.com and saw very few mentions of postgresql. Are there any examples of high-traffic sites that use postgresql? Does anyone have any experience with it? I'm having trouble finding good, recent studies of postgres (and postgres compared w/ mysql) online.

Click to read more ...

Wednesday
Aug222007

Profiling WEB applications

Hi, Some of the articles of the site claims profiling is essential. Is there any established approach to profiling WEB apps? Or it too much depends on technologies used?

Click to read more ...

Friday
Aug172007

What is the best hosting option?

The questions was extracted from: http://highscalability.com/plentyoffish-architecture#comment-126 For startup like Markus, what is the best hosting option (and grow more later)? host your own server or use ISP co-location option? He still has to pay huge money on the bandwidth with that payload, right?

Click to read more ...

Friday
Aug102007

How do we make a large real-time search engine?

We're implementing a website which should be oriented to content and with massive access by public and we would need a search engine to index and execute queries on the indexes of contents (stored in a database, most likely MySQL InnoDB or Oracle). The solution we found is to implement a separate service to make index constantly the contents of the database at regular intervals. Anyway, this is a complex and not optimal solution, since we would like it to index in real time and make it searchable. Could you point me to some examples or articles I could review to design a solution for such this context?

Click to read more ...

Thursday
Aug092007

Lots of questions for high scalability / high availability

Hey, I do have a website that I would like to scale. Right now we have 10 servers but this does not scale well. I know how to deal with my apache web servers but have problems with sql servers. I would like to use the "scale out" system and add servers when we need. We have over 100Gb of data for mysql and we tried to have around 20G per server. It works well except that if a server goes down then 1/5 of the user can't access the website. We could use replication but we would need to at least double sql servers to replicate each server. And maybe in the future it's not gonna be enough we would need maybe 3 slaves per master ... well I don't really like this idea. I would prefer to have 8 servers that all deal with data from the 5 servers we have right now and then we could add new servers when we need. I looked at NFS but that does not seem to be a good idea for SQL servers ? Can you confirm?

Click to read more ...

Wednesday
Aug082007

Partial String Matching

Is there any alternative to LIKE '%...%' OR LIKE '%...%' in MySQL if you have to offer partial string matching on a large dataset?

Click to read more ...

Tuesday
Aug072007

What qps should we design for in making a MySpace like site?

We are currently building a high traffic portal like myspace. What is the qps that we have to keep in mind and develop the site so that it can be scalable as the traffic grows?

Click to read more ...

Friday
Aug032007

Scaling IMAP and POP3

Just thought I'd drop a brief suggestion to anyone building a large mail system. Our solution for scaling mail pickup was to develop a sharded architecture whereby accounts are spread across a cluster of servers, each with imap/pop3 capability. Then we use a cluster of reverse proxies (Perdition) speaking to the backend imap/pop3 servers . The benefit of this approach is you can use simply use round-robin or HA loadbalancing on the perdition servers that end users connect to (e.g. admins can easily move accounts around on the backend storage servers without affecting end users). Perdition manages routing users to the appropriate backend servers and has MySQL support. What we also liked about this approach was that it had no dependency on a distributed or networked filesystem, so less chance of corruption or data consistency issues. When an individual server reaches capacity, we just off load users to a less used server. If any server goes offline, it only affects the fraction of users assigned to that server. Best, Erik Osterman

Click to read more ...

Thursday
Aug022007

Multilanguage Website

Hi , someone can point me to some good resurce about how to bulid a multilanguage website ? the only resource i have found is this http://www.indiawebdevelopers.com/technology/multilanguage_support.asp thanks! p.s. great site ;)

Click to read more ...

Tuesday
Jul312007

BerkeleyDB & other distributed high performance key/value databases

I currently use BerkeleyDB as an embedded database http://www.oracle.com/database/berkeley-db/ a decision which was initially brought on by learning that Google used BerkeleyDB for their universal sign-on feature. Lustre looks impressive, but their white paper shows speeds of 800 files created per second, as a good number. However, BerkeleyDB on my mac mini does 200,000 row creations per second, and can be used as a distributed file system. I'm having I/O scalability issues with BerkeleyDB on one machine, and about to implement their distributed replication feature (and go multi-machine), which in effect makes it work like a distributed file system, but with local access speeds. That's why I was looking at Lustre. The key feature difference between BerkeleyDB and Lustre is that BerkeleyDB has a complete copy of all the data on each computer, making it not a viable solution for massive sized database applications. However, if you have < 1TB (ie, one disk) of total possible data, it seems to me that a replicated local key/value database is the fastest solution. I haven't found much discussion of people using this kind of technology for highly scalabable web sites. Over the years, I've had extremely good performance results with dbm files, and have found that nothing beats local data, access through C APIs, and btree or hash table implementations. I have never tried replicated/redundant versions of this approach, and I'm curious if others have, and what your experience has been.

Click to read more ...