Entries in General Discussion (161)

Friday
Mar282008

How to Get DNS Names of a Web Server

For some special reason, I'm trying to make a web server able to get all the DNS names mapped to its IP. Let me explain more, I'm creating a website that will run in a web farm, every web server in the farm will have some subdomains mapped to its ip, what I want is that whenever my application starts on a web server is to be able to get all the subdomains mapped/assigned to that server, e.g. sub1.mydomain.com, sub2.mydomain.com. I understand that I have to use reverse dns lookup (i.e. give the IP get the domain name), but I also want to get all the subdomains not just the first one that maps to that IP. I've been reading about DNS on the internet but I don't seem to find any information on how to achieve what I want, normally you use dns to get the ip of a domain but I'm not sure that all servers enable reverse lookup. The problem is that I'm still not sure whether I'll host my own DNS server or use the services of some company (many companies offer DNS hosting services), so, my question is: - If I host my own DNS server, will it be possible to get all the subdomains using reverse lookup? Another question here, if I enable reverse lookup on my DNS server, can this have any negative side effects? As to security .. etc .. is there any way I can enable only my web servers to do reverse lookup while preventing anybody else on the internet from using reverse lookup? - If use the DNS hosting services of some company, will I be able to do what I want? ie. get the subdomains mapped to the IP address of a web server? Unfortunately I don't have much experience with working with web farms, so I would like also to ask whether every web server in the web farm gets its own static IP or how does it work? I mean you have the firewall ... etc .. so I don't know how IP assignments works in a web farm scenario .. Thanks a million in advance and sorry for my really long post .. Wal

Click to read more ...

Tuesday
Mar182008

Shared filesystem on EC2

Hi. I'm looking for a way to share files between EC2 nodes. Currently we are using glusterfs to do this. It has been reliable recently, but in the past it has crashed under high load and we've had trouble starting it up again. We've only been able to restart it by removing the files, restarting the cluster, and filing it up again with our files from backup. This takes ages, and will take even longer the more files we get. What worries me is that it seems to make each node a point of failure for the entire system. One node crashes and soon the entire cluster has crashed. The other problem is adding another node. It seems like you have to take down the whole thing, reconfigure to include the new node, and restart. This kind of defeats the horizontal scaling strategy. We are using 2 EC2 instances as web servers, 1 as a DB master, and 1 as a slave. GlusterFS is installed on the web server machines as well as the DB slave machine (we backup files to s3 from this machine). The files are mostly thumbnails, but also some larger images and media files. Does anyone have a good solution for sharing files between EC2 nodes? I like the ThruDB [http://trac.thrudb.org/] concept of using the local filesystem as a cache for S3, but I'm not sure if ThruDB is mature enough yet. Or maybe some kind of distributed filesystem built on top of git would work? Any ideas? Thanks! ~rvr

Click to read more ...

Tuesday
Mar182008

Database Design 101

I am working on the design for my database and can't seem to come up with a firm schema. I am torn between normalizing the data and dealing with the overhead of joins and denormalizing it for easy sharding. The data is essentially music information per user: UserID, Artist, Album, Song. This lends itself nicely to be normalized and have separate User, Artist, Album and Song databases with a table full of INTs to tie them together. This will be in a mostly read based environment and with about 80% being searches of data by artist album or song. By the time I begin the query for artist, album or song I will already have a list of UserID's to limit the search by. The problem is that the tables can get unmanageably large pretty quickly and my plan was to shard off users once it got too big. Given this simple data relationship what are the pros and cons of normalizing the data vs denormalizing it? Should I go with 4 separate, normalized tables or one 4 column table? Perhaps it might be best to write the data in both formats at first and see what query speed is like once the tables fill up... Another potential issue would be the fact that inserts will be coming in batches of about 500 - 2000+ per user at a time which will be pretty intensive to pull off for the normalized table as there will need to be quite a few selects for each insert due to the fact that the artist, album or song may already be in the database or it may not requiring an insert. What do you all think?

Click to read more ...

Saturday
Mar152008

New Website Design Considerations

I am in the design phase of getting a website up and running that will have scalability as a main concern. I am looking for opinions on architecture and the like for this endeavor. The site has a few unique characteristics that make scalability difficult. Users will all have a pretty large amount of data that other users will be able to search. The site will be entirely based around search. The catch is that other users will be searching always with a stipulation of 'n' miles from me. I imagine that fact will kill the possibility of query caching for most searches. I have extensive experience with PHP and MYSQL, some experience with ASP.NET/C#, some experience with perl but can learn anything fast. The site will start out on a single server but I want to be 100% certain that I architect the code and databases such that scaling will be simple. What language should I code the site in? What DB would you use: Postgres, MYSQL, MSSQL, BerkelyDB? Should we shard the database by location? by user? not at all? What does everyone think for possible architectures on this?

Click to read more ...

Saturday
Mar082008

DNS-Record TTL on worst case scenarios

i didnt find a nearly good solution for this problem yet: imagine, you're responsible for a small CDN network (static images), with two different datacenter. the balancing for the two DC is done with a anycast nameservice (a nameserver in every DC, user gets on nearest location). so, one of the scenario is that one of the datacenters goes down completly. you can do a monitoring on the nameserver and only route to the dc which is still alive, no problem. But what about the TTL from the DNS-Records? Tiny TTLs like 2 min. are often ignored by several ISP (e.g. AOL). so, the client doesn't get the IP from the other Datacenter. what could be a solution in this scenario?

Click to read more ...

Tuesday
Feb262008

Architecture to Allow High Availability File Upload

Hi, I was wondering if anyone has found any information on how to architect a system to support high availability file uploads. My scenario: I have an Apache server proxying requests to a bunch of Tomcat Java application servers. When I need to upgrade my site, I stop and upgrade each of the Tomcat servers one at a time. This seems to work well as Apache automatically routes subsequent requests for the stopped app server to the remaining app servers that are up. The problem is that if a user is uploading a file when the app server is stopped, the upload fails and the user has to upload the file again. This is problematic as uploading files is an integral feature of the site and it's frustrating for the users to have to restart their uploads every time I upgrade the site (which I want to be able to do frequently). Has anyone seen any information on how this can be done or have ideas on how this can be architected? I imagine sites like Flickr must have a solution to this problem as I have seen presentations they say that they are able to upgrade their site several times a day without the users noticing. Thanks! Tuyen

Click to read more ...

Thursday
Feb212008

Tracking usage of public resources - throttling accesses per hour

Hi, We have an application that allows the user to define a publicly available resource with an ID. The ID can then be accessed via an HTTP call, passing the ID. While we're not a picture site, thinking of a resource like a picture may help understand what is going on. We need to be able to stop access to the resource if it is accessed 'x' times in an hour, regardless of who is requesting it. We see two options - go to the database for each request to see if the # of returned in the last hour is within the limit. - keep a counter in each of the application servers and sync the counters every few minutes or # of requests to determine if we've passed the limit. The sync point would be the database. Going to the database (and updating it!) each time we get a request isn't very attractive. We also have a load balanced farm of servers, so we know 'x' is going to have to be a soft limit if we count in the app serevrs. (We know there will be a period of time between syncing the counts in the app servers where we'll overshoot the limit. That is okay since we'll catch the limit violation and stop the requests.) Other thoughts on how do to this? Thanks, Chris

Click to read more ...

Tuesday
Feb192008

Building a email communication system

hi, the website i work for is looking to build a email system that can handle a fair few emails (up to a hundred thousand a day). These comprise emails like registration emails, newsletters, lots of user triggered emails and overnight emails. At present we queue them in SQL and feed them into an smtp server on one of our web servers when the queue drops below a certain level. this has caused our mail system to crash as well as hammer our DB server (shared!!!). We have got an architecture of what we want to build but thought there might be something we could buy off the shelf that allowed us to keep templated emails, lists of recipients, schedule sends etc and report on it. We can't find anything What do big websites like amazon etc use or people a little smaller but who still send loads of mail (flickr, ebuyer, or other ecommerce sites) Cheers tarqs

Click to read more ...

Monday
Feb182008

How to deal with an I/O bottleneck to disk?

A site I'm working with has an I/O bottleneck. They're using a static server to deliver all of the pictures/video content/zip downloads ecetera but now that the bandwith out of that server is approaching 50Mbit/second the latency on serving small files has increased to become unacceptable. I'm curious how other people have dealt with this situation. Seperating into two different servers would require a significant change to the sites architecutre (because the premise is that all uploads go into one server, all subdirectorie are created in one directory, etc.) and may not really solve the problem.

Click to read more ...

Monday
Feb182008

limit on the number of databases open

Have a few doubts.. here are the qs 1) is there any limit on the number of databases that can be accessed simultaneously? (MySQL) 2) will it be a problem to scale in the future if there are large number of small databases(2-5 MB) each?

Click to read more ...

Page 1 ... 6 7 8 9 10 ... 17 Next 10 Entries »